H-FARM AI's Newsletter
Posts
Claude Sonnet 5 ships at $2 per million tokens

Claude Sonnet 5 ships at $2 per million tokens

PLUS: Google ships Nano Banana 2 Lite and Omni Flash & Etched raises $800M to challenge Nvidia chips. U.S. lifts export controls on Claude Fable 5, OpenAI cuts inference costs 50% via compute multiplier.

July 01, 2026

1️⃣ Anthropic launches Claude Sonnet 5 with near-Opus-level agentic performance at $2/$10 introductory API pricing

2️⃣ Google drops Nano Banana 2 Lite and Gemini Omni Flash, enabling image-to-video workflows at $0.034/image and $0.10/sec

3️⃣ AI chip startup Etched raises $800M at a $5B valuation, revealing working inference hardware and $1B in customer contracts

The U.S. Department of Commerce lifts export controls on Claude Fable 5 and Mythos 5, restoring access for vetted domestic organizations
OpenAI reportedly discovers a “compute multiplier” technique that cuts inference costs by more than 50%, paired with its new Jalapeño chip

MAIN AI UPDATES / 1st July 2026

🤖 Claude Sonnet 5 ships at $2 per million tokens 🤖
Anthropic's newest Sonnet-tier model targets agentic workloads with aggressive pricing.

Anthropic has released Claude Sonnet 5, a model the company says approaches Opus 4.8 capabilities while substantially outperforming Sonnet 4.6 across planning, tool use, coding, and knowledge tasks. The model can operate a browser or terminal and execute extended multi-step jobs autonomously. Notably, Anthropic says cybersecurity benchmarks came in lower than its predecessor because it "did not deliberately train" on those tasks. Sonnet 5 is available across all plans at introductory API pricing of $2/$10 per million input/output tokens until August 31, rising to $3/$15 after — this directly pressures OpenAI and Google on agentic pricing. It's one of the most significant model drops of the quarter.

🎨 Google ships Nano Banana 2 Lite and Omni Flash 🎨
Two new models bring speed and low-cost pricing to multimodal content creation.

Google has released two generative media models: Nano Banana 2 Lite, its fastest image model, and Gemini Omni Flash for video generation and editing. Lite produces an image in just four seconds at $0.034 per image, while Omni Flash generates and edits 10-second video clips at $0.10/sec and tops text-to-video leaderboards, trailing only Seedance 2.0 in editing benchmarks. The key capability is chaining — users create an image with Lite, feed it into Omni Flash, and animate it into video in a single workflow, lowering multimodal production costs substantially. Both models are live through AI Studio, the Gemini API, and Google's enterprise and consumer products.

💰 Etched raises $800M to challenge Nvidia chips 💰
The startup's $5B valuation signals a credible replacement threat to Nvidia GPUs.

AI chip startup Etched has closed $800 million in new funding at a $5 billion valuation, one of the largest raises for an AI hardware company to date. Alongside the funding, Etched revealed a working inference chip with a full server rack deployment and disclosed $1 billion in customer contracts — working silicon plus real revenue commitments signal serious competitive pressure on Nvidia. The company is betting that purpose-built chips can outperform general-purpose GPUs on transformer workloads, positioning itself squarely in the booming AI inference hardware market as demand for cost-efficient compute continues to surge.

INTERESTING TO KNOW

🏛️ U.S. lifts export controls on Claude Fable 5 🏛️

A major regulation shift as the U.S. Department of Commerce has formally lifted export controls on Anthropic's most powerful models, Claude Fable 5 and Mythos 5. Anthropic has begun restoring Mythos 5 access for roughly 100 vetted U.S. organizations, while Fable 5 could return as soon as this week. Restrictions on foreign-national access remain in place — this reshapes how frontier models reach domestic organizations while keeping national-security guardrails intact.

⚡ OpenAI cuts inference costs 50% via compute multiplier ⚡

Pricing could shift across OpenAI's products after the company reportedly discovered a technique it calls a “compute multiplier” that more than halves inference costs, per The Information. The efficiency gain arrives alongside the debut of OpenAI's custom Jalapeño chip, suggesting parallel software and hardware optimization. Currently the savings apply to guest ChatGPT users with limited features — halved inference costs could unlock more aggressive API pricing if broadly deployed.

📩 Have questions or feedback? Just reply to this email , we’d love to hear from you!

🔗 Stay connected: