AI News | Field Notes by Michael Nemtsev

Open Models Close the Gap | AI Field Notes #49

A small key opens a giant vault while a ladder loses its lowest rungs, suggesting cheap open models unlock the frontier as entry jobs vanish.

Open-weight coding models closed the gap with the frontier this week: Moonshot's Kimi K2.7 beat Claude Opus 4.8 on tool use at about 95 cents per million tokens, and Cohere shipped a 30-billion-parameter model built to run inside a company's own walls. OpenAI bought Ona to give its Codex agents a place to execute, and Nvidia put its name on the strongest US open-weights model yet. Away from the tools desk, Anthropic said Claude now writes 80 percent of its own merged code, 2026 tech layoffs passed 180,000 with most blaming AI, and the EU AI Act's labeling rules land August 2. The cheapest capable option is increasingly the open one, and the cost is shifting from licenses to compute and to jobs.

AI Models ·Build Fast with AI

Kimi K2.7 Code: open model beats Claude Opus on tool use at a tenth the price

AnalysisAn open-weight model from China just outscored the frontier on the benchmark developers actually feel. Moonshot AI released Kimi K2.7 Code on June 12, and it posts 81.1 percent on a tool-use test (whether a model can correctly call external tools and APIs) against 76.4 percent for Claude Opus 4.8. It runs for about 95 cents per million input tokens, trims reasoning tokens by roughly 30 percent, and ships under open weights anyone can download from Hugging Face. The distance between a paid frontier model and a free download keeps closing on the numbers that decide what a team builds on.

AI Agents ·Radical Data Science

Cohere North Mini Code: a 30B coding model built to run inside your own walls

AnalysisCohere, the Toronto AI lab that builds for business deployments, shipped North Mini Code on June 10 under an Apache 2.0 license (free to use, change, and sell on, including commercially). It is a 30-billion-parameter mixture-of-experts model (a design that wakes only part of itself per request, here about 3 billion parameters, to cut cost) aimed at teams that cannot send source code to someone else's cloud. The pitch is sovereign AI: a bank or a hospital running a capable coding model on hardware it controls. Small and open is becoming the default answer for anyone who treats their code as too sensitive to ship offsite.

AI Agents ·Radical Data Science

OpenAI buys Ona to give Codex agents a place to actually run

AnalysisOpenAI acquired Ona on June 12 to fold secure cloud execution and orchestration into Codex, its coding agent. The prize is plumbing rather than intelligence: a long-running agent needs a persistent, customer-controlled environment where it can edit files, run tests, and hold state across hours of work without a human watching the session. Buying that capability instead of building it points at where the real bottleneck sits now. The hard part of agentic coding stopped being how smart the model is and became how to give it a sandbox that outlives a single request.

AI Models ·Radical Data Science

Nvidia's Nemotron 3 Ultra ships as the strongest US open-weights model yet

AnalysisNvidia released Nemotron 3 Ultra on June 1, a 550-billion-parameter mixture-of-experts model (only about 55 billion parameters fire per request to hold down cost) that the independent benchmarker Artificial Analysis rated the most capable open-weights model yet from a US company. Open weights mean a team can download it, run it on its own machines, and tune it without asking anyone's permission. The chipmaker putting its name on a frontier-class open model is a tell: Nvidia sells more hardware when capable models are free, because someone still has to buy the silicon to run them.

AI Models ·Radical Data Science

Xiaomi's MiMo UltraSpeed claims 1,000 tokens a second from a trillion-parameter model

AnalysisThroughput is the whole pitch. Xiaomi opened a capped trial of MiMo-V2.5-Pro-UltraSpeed on June 9, a one-trillion-parameter model it claims generates around 1,000 tokens per second, roughly ten times typical frontier latency. For anyone who has watched an agent grind through a long chain of steps, throughput like that resets expectations: a reasoning loop that took a minute could land in seconds. The trial runs June 9 to 23 through a limited API, so treat the figure as a claim to verify rather than a settled benchmark. The direction it points is hard to miss.

AI Agents ·Radical Data Science

MolmoAct 2: Allen Institute ships an open robotics model that runs 37x faster

AnalysisThe Allen Institute for AI (Ai2, a nonprofit lab in Seattle) released MolmoAct 2 on June 10, an open robotics foundation model (a base model meant to be adapted for many robot tasks) trained on more than 700 hours of two-armed robot demonstrations. The lab says it runs up to 37 times faster than the prior version, the kind of jump that decides whether a robot reacts in real time or stutters mid-reach. Open weights count for more here than in chat: a startup building a warehouse arm can begin from a capable base instead of spending a year collecting its own motion data.

LLM Evals ·Build Fast with AI

Pack hunt jailbreak: a multi-agent attack pries open a model and leaks its prompt

AnalysisA jailbreak crew posting as Pliny the Liberator cracked a frontier model on June 10 with what they called a pack hunt: several agents working together, swapping Cyrillic letters for Latin look-alikes to slip past filters, then breaking a banned request into pieces and reassembling it. The trophy was a 120,000-character system prompt (the hidden instructions a vendor wires in to steer a model) dumped on GitHub. One person poking at a chatbot is background noise. A coordinated swarm that automates the probing is the threat shape every team shipping an AI feature now has to plan around.

LLM Evals ·Radical Data Science

Kaggle, OpenAI and Google open a contest to break AI agents on purpose

AnalysisKaggle opened a security competition on June 12 with OpenAI, Google, and the IEEE (the global engineering standards body) to stress-test agents against multi-step tool attacks, where a malicious input rides quietly through several tool calls before it does harm. Entrants write attack algorithms against simulated agents and score points for what they manage to break. Launching it as an open contest, days after a real swarm jailbreak leaked a system prompt, reads as a quiet admission that a company red-teaming its own product (probing its own defenses) no longer finds the holes fast enough.

AI Industry ·Build Fast with AI

Anthropic says Claude now writes 80% of the code merged into its own product

AnalysisEighty percent of the code merged into Anthropic's production systems was written by Claude, the company said in a June 4 paper called When AI Builds Itself, published alongside a call for a coordinated global pause on frontier development. Leave the politics aside and the number carries the story: the lab that builds the model has handed most of its own line-by-line coding to that model. The boilerplate and the small first pull requests that used to earn a developer a junior seat are exactly the work being absorbed first.

AI Industry ·Build Fast with AI

AI IPO race: OpenAI and Anthropic both file confidential paperwork to go public

AnalysisBoth leading AI labs are now in the IPO pipeline at once. Anthropic filed confidential draft paperwork with US regulators in early June, OpenAI followed with its own, and the figures behind the filings explain the hurry: Anthropic's annualized revenue reached roughly 47 billion dollars by May, up from about 9 billion at the end of 2025. Revenue rising five-fold in months makes a public listing look overdue. Going public also forces the books open, and the spending side of these companies has stayed mostly out of view so far.

AI Industry ·IAPP

EU AI Act: from August, AI systems must label what they generate

AnalysisFrom August 2, 2026, Article 50 of the EU AI Act (Europe's binding law on artificial intelligence) requires anyone deploying a generative model to mark its output in a machine-readable way and to disclose AI-made text or deepfakes published on matters of public interest. The final Code of Practice spelling out how to comply is due this month. The reach is what teams underestimate: the rules bind any company whose content reaches European users, wherever that company is based. Invisible watermarking and provenance tags stop being optional polish and become a condition of shipping.

AI Industry ·Insider Monkey

TSMC's CEO says AI chip supply stays tight for years, not months

AnalysisThe company that makes the chips everyone else designs expects demand to outrun supply for years. TSMC chief executive C.C. Wei said in early June that AI orders will keep global chip supply tight well past 2026, even as the foundry (a factory that manufactures chips for other firms) runs flat out and uses AI inside its own plants to lift yield. When the single most important supplier in the chain calls the squeeze structural, the cost lands downstream as pricier, scarcer compute. Every team renting GPUs is standing at the far end of that queue.

Subscribe for full archive access

Every past issue, weekly deep dives, and the full back catalogue — delivered free.

Read on Substack

Want this in your inbox?

One email a day, zero hype.

A short read every morning: what actually changed in AI, and what it means for work and daily life. Free, unsubscribe anytime.