AI Inference Money War | AI Field Notes #53

A fortress-like server with a coin meter looms over people locked outside a gate, while a few slip away to build their own small machines.

The cost of running AI models, separate from training them, is becoming the fight that shapes what developers can ship, and this week the money moved toward open-weight and on-prem options. Baseten raised $1.5 billion at up to a $13 billion valuation for inference infrastructure, Switzerland's Prem AI pulled in $100 million for private on-prem deployments, and China's MiniMax open-weighted a frontier model developers can self-host. Away from the model race, 2026 layoffs crossed 185,000 with 56 percent citing AI, GitHub Copilot switched to metered token-based billing, and Cursor's Composer 2.5 pushed coding further toward supervising agents. A quiet week for new frontier launches, so this issue reaches across the past ten days.

AI Agents LLM Evals AI Models AI Industry

Latest issue · About

AI Industry ·TechCrunch

AI inference: Baseten nears $1.5B raise at up to $13B valuation

AnalysisInference, the step where a trained model actually answers a request, just got priced like its own industry. Baseten is closing a $1.5 billion round at a valuation between $11 and $13 billion, more than double its $5 billion mark from January, when annualized revenue ran near $600 million. The company rents capacity from roughly 20 cloud providers and routes each request to the cheapest model that can handle it, often an open-weight one. Spark Capital, Altimeter, Sands, and Wellington co-led. The signal underneath is that the money is shifting from training models to serving them at scale.

Read the source for AI inference: Baseten nears $1.5B raise at up to $13B valuation · TechCrunch · techcrunch.com

AI Industry ·Bloomberg

Private AI: Swiss startup Prem raises $100M as export bans bite

AnalysisExport bans just made owning your AI look smarter than renting it. Prem AI, a Swiss startup that helps hedge funds and law firms run models on their own infrastructure, is raising $100 million in a Series A at a valuation of at least $500 million, up from a $200 million bridge round. The raise landed the same week it launched Fluso, a workspace built for organizations handling their most sensitive data, and the same week Anthropic's US block reminded everyone that hosted access can vanish. Backers include Jim Breyer, Index Ventures, and Sequoia China co-founder Fan Zhang. Sovereign and on-prem AI is having a moment for a concrete reason.

Read the source for Private AI: Swiss startup Prem raises $100M as export bans bite · Bloomberg · bloomberg.com

AI Industry ·Spectrum AI Lab

AI coding cost: GitHub Copilot moves to metered, token-based billing

AnalysisGitHub quietly changed what writing code with AI costs. On June 1 Copilot moved from charging per request to usage-based billing, metered in "AI Credits" worth a cent each, so the bill now tracks the model you pick and the tokens you burn rather than a flat per-request fee. New signups for paid tiers were paused in late April ahead of the switch. For a small team that leaned on a fixed monthly number, the cost of an agent reasoning over a large codebase can now swing with every long session. Predictable pricing was part of the original pitch, and it just got less predictable.

Read the source for AI coding cost: GitHub Copilot moves to metered, token-based billing · Spectrum AI Lab · spectrumailab.com

AI Models ·Tech Times

Open weights: MiniMax M3 lands as a self-host hedge against the ban

AnalysisTiming turned a Chinese lab into the obvious hedge. MiniMax released M3 as an open-weight model on June 1, posted the weights to Hugging Face by June 7, and published its sparse-attention method (a design that skips most of the math at long context to run faster) on arXiv on June 11. M3 claims 59 percent on SWE-Bench Pro, a hard real-world coding test, with a 1-million-token context window and decoding roughly 15 times faster than its predecessor. As the US ban locked foreign developers out of Anthropic's frontier models, an open-weight rival they can download and run themselves looked less like a curiosity and more like an exit.

Read the source for Open weights: MiniMax M3 lands as a self-host hedge against the ban · Tech Times · techtimes.com

LLM Evals ·LLM-Stats SWE-bench Verified

Open coding models: DeepSeek-V4-Pro tops the open SWE-bench leaderboard

AnalysisThe open-weight escape hatch already has a leader. DeepSeek-V4-Pro, a 1.6-trillion-parameter mixture-of-experts model (a design that activates only part of itself per request to cut cost) released under the permissive MIT license, now tops the open-source rankings on SWE-Bench Verified, a test of fixing real software bugs, at roughly 0.81. After DeepSeek made a 75 percent price cut permanent in May, its rate sits near $0.87 per million output tokens, a fraction of frontier closed models. With the US ban pushing developers toward weights they can host themselves, the cheapest capable open model stops reading as a budget compromise.

Read the source for Open coding models: DeepSeek-V4-Pro tops the open SWE-bench leaderboa… · LLM-Stats SWE-bench Verified · llm-stats.com

AI Industry ·CNBC

On-device AI: Nvidia's RTX Spark Superchip targets laptops, not servers

AnalysisNvidia wants the chip in your next laptop, not only the ones humming in data centers. At Computex on June 1, Jensen Huang unveiled the RTX Spark Superchip, a processor aimed at Windows laptops and desktops that will ship this fall in machines from Dell and Lenovo. The move pushes Nvidia straight into territory held by Intel and AMD and extends its reach across every layer of the AI stack, from the cloud down to the keyboard. The pitch is local AI: running models on the device instead of paying for a server round-trip. Whether buyers want on-device inference enough to switch chips is the open question.

Read the source for On-device AI: Nvidia's RTX Spark Superchip targets laptops, not serve… · CNBC · cnbc.com