AI Field Notes by Michael Nemtsev

AI Inference Money War | AI Field Notes #53

A fortress-like server with a coin meter looms over people locked outside a gate, while a few slip away to build their own small machines.

The cost of running AI models, separate from training them, is becoming the fight that shapes what developers can ship, and this week the money moved toward open-weight and on-prem options. Baseten raised $1.5 billion at up to a $13 billion valuation for inference infrastructure, Switzerland's Prem AI pulled in $100 million for private on-prem deployments, and China's MiniMax open-weighted a frontier model developers can self-host. Away from the model race, 2026 layoffs crossed 185,000 with 56 percent citing AI, GitHub Copilot switched to metered token-based billing, and Cursor's Composer 2.5 pushed coding further toward supervising agents. A quiet week for new frontier launches, so this issue reaches across the past ten days.

AI Industry ·TechCrunch

AI inference: Baseten nears $1.5B raise at up to $13B valuation

AnalysisInference, the step where a trained model actually answers a request, just got priced like its own industry. Baseten is closing a $1.5 billion round at a valuation between $11 and $13 billion, more than double its $5 billion mark from January, when annualized revenue ran near $600 million. The company rents capacity from roughly 20 cloud providers and routes each request to the cheapest model that can handle it, often an open-weight one. Spark Capital, Altimeter, Sands, and Wellington co-led. The signal underneath is that the money is shifting from training models to serving them at scale.

AI Industry ·Bloomberg

Private AI: Swiss startup Prem raises $100M as export bans bite

AnalysisExport bans just made owning your AI look smarter than renting it. Prem AI, a Swiss startup that helps hedge funds and law firms run models on their own infrastructure, is raising $100 million in a Series A at a valuation of at least $500 million, up from a $200 million bridge round. The raise landed the same week it launched Fluso, a workspace built for organizations handling their most sensitive data, and the same week Anthropic's US block reminded everyone that hosted access can vanish. Backers include Jim Breyer, Index Ventures, and Sequoia China co-founder Fan Zhang. Sovereign and on-prem AI is having a moment for a concrete reason.

AI Industry ·Spectrum AI Lab

AI coding cost: GitHub Copilot moves to metered, token-based billing

AnalysisGitHub quietly changed what writing code with AI costs. On June 1 Copilot moved from charging per request to usage-based billing, metered in "AI Credits" worth a cent each, so the bill now tracks the model you pick and the tokens you burn rather than a flat per-request fee. New signups for paid tiers were paused in late April ahead of the switch. For a small team that leaned on a fixed monthly number, the cost of an agent reasoning over a large codebase can now swing with every long session. Predictable pricing was part of the original pitch, and it just got less predictable.

AI Models ·Tech Times

Open weights: MiniMax M3 lands as a self-host hedge against the ban

AnalysisTiming turned a Chinese lab into the obvious hedge. MiniMax released M3 as an open-weight model on June 1, posted the weights to Hugging Face by June 7, and published its sparse-attention method (a design that skips most of the math at long context to run faster) on arXiv on June 11. M3 claims 59 percent on SWE-Bench Pro, a hard real-world coding test, with a 1-million-token context window and decoding roughly 15 times faster than its predecessor. As the US ban locked foreign developers out of Anthropic's frontier models, an open-weight rival they can download and run themselves looked less like a curiosity and more like an exit.

LLM Evals ·LLM-Stats SWE-bench Verified

Open coding models: DeepSeek-V4-Pro tops the open SWE-bench leaderboard

AnalysisThe open-weight escape hatch already has a leader. DeepSeek-V4-Pro, a 1.6-trillion-parameter mixture-of-experts model (a design that activates only part of itself per request to cut cost) released under the permissive MIT license, now tops the open-source rankings on SWE-Bench Verified, a test of fixing real software bugs, at roughly 0.81. After DeepSeek made a 75 percent price cut permanent in May, its rate sits near $0.87 per million output tokens, a fraction of frontier closed models. With the US ban pushing developers toward weights they can host themselves, the cheapest capable open model stops reading as a budget compromise.

AI Industry ·CNBC

On-device AI: Nvidia's RTX Spark Superchip targets laptops, not servers

AnalysisNvidia wants the chip in your next laptop, not only the ones humming in data centers. At Computex on June 1, Jensen Huang unveiled the RTX Spark Superchip, a processor aimed at Windows laptops and desktops that will ship this fall in machines from Dell and Lenovo. The move pushes Nvidia straight into territory held by Intel and AMD and extends its reach across every layer of the AI stack, from the cloud down to the keyboard. The pitch is local AI: running models on the device instead of paying for a server round-trip. Whether buyers want on-device inference enough to switch chips is the open question.

Want the next issue?

Get AI Field Notes by email.

A short morning brief on what actually changed in AI. Free, unsubscribe anytime.

Read on Substack