AI Cost Squeeze | AI Field Notes #61

An engineer feeds coins into a neural-network machine while a wall meter nears a red limit, showing how the cost of running AI now rations it.

AI cost squeeze: the rising cost of running models is now the constraint shaping engineering decisions. Tesla capped employee AI spending at $200 a week after engineers burned through thousands, and the UK's AI Security Institute showed agent capability keeps climbing the more compute you buy. Mistral shipped Leanstral 1.5, a free open model that solved 587 Putnam math problems and found real bugs, while Microsoft spent $2.5 billion embedding engineers to prove enterprise AI pays off. Anthropic tightened Claude Code access for Chinese firms and weighed its own Samsung-made chip, as AI bug-hunters logged 1,500 high-severity flaws in June, 3.5 times the human record.

AI Agents LLM Evals AI Models AI Industry

Latest issue · About

AI Models ·Mistral AI

Mistral Leanstral 1.5: an open proof model that solved 587 Putnam problems

AnalysisFormal proof engineering, the slow work of getting a computer to verify that a piece of math or code is provably correct, just got a capable free tool. Mistral released Leanstral 1.5 on July 2 under an Apache-2.0 license (free to use, modify, and sell), built for Lean 4, a language for writing machine-checkable proofs. It solved 587 of 672 problems from the Putnam undergraduate math competition and scored 100% on the miniF2F benchmark. Pointed at real repositories, it surfaced five previously unknown bugs. The model carries 119 billion parameters but fires only 6 billion per query, which keeps it cheap to run.

Read the source for Mistral Leanstral 1.5: an open proof model that solved 587 Putnam pro… · Mistral AI · mistral.ai

LLM Evals ·The Decoder

AI bug-hunters logged 1,500 high-severity CVEs in June, 3.5x the human record

AnalysisIn June, 21 organizations reported roughly 1,500 high-severity and critical software vulnerabilities (CVEs, catalogued security flaws), more than 3.5 times the previous monthly record, and most were found by AI rather than people. Epoch AI, a research group that tracks the field, charted the surge starting in April, when Anthropic released Claude Mythos Preview, its bug-hunting model. Anthropic's Glasswing program alone has turned up over 10,000. The catch lands on the other side of the pipe: Anthropic conceded the tool 'finds bugs faster than developers can patch them.' Discovery scaled overnight; remediation did not.

Read the source for AI bug-hunters logged 1,500 high-severity CVEs in June, 3.5x the huma… · The Decoder · the-decoder.com

AI Agents ·The Information

Claude Code's China problem: Anthropic locks the door, Alibaba tells staff to delete it

AnalysisAlibaba told its own engineers to stop using Anthropic's Claude Code and to wipe every Claude model off work machines, citing security and backdoor worries. On the other side of the Pacific, Anthropic is closing the workarounds, cloud resellers and overseas shells, that let Chinese firms reach its models despite terms that ban sales to China-controlled companies. The spark: researchers found hidden code in Claude Code that could flag users based in China. Anthropic's Thariq Shihipar called it 'an experiment from March to stop account abuse and distillation,' since replaced by stronger safeguards. Two governments, one tool, both slamming doors.

Read the source for Claude Code's China problem: Anthropic locks the door, Alibaba tells… · The Information · theinformation.com

AI Agents ·The Information

Microsoft Copilot overhaul: one app, background AutoPilot agents, features cut

AnalysisMicrosoft is folding its consumer and enterprise Copilot apps into one and bolting on a new tier called AutoPilot, agents that run tasks like scheduling and email triage in the background without being asked. Planned for August, the overhaul kills underused pieces such as Copilot Podcasts and Copilot Labs. Executive VP Jacob Andreou said the app has to 'earn the right to exist' and focus on 'real work' over 'intelligence for intelligence's sake.' It is the same super-app bet Anthropic and OpenAI are making with Claude Code and Codex: stop shipping features, start finishing jobs.

Read the source for Microsoft Copilot overhaul: one app, background AutoPilot agents, fea… · The Information · theinformation.com

AI Industry ·Microsoft

Microsoft's $2.5B Frontier unit puts 6,000 engineers inside customers to prove ROI

AnalysisChasing proof that enterprise AI actually pays, Microsoft is spending $2.5 billion to stand up a unit called Frontier and embedding 6,000 engineers and industry specialists directly inside customer companies. Judson Althoff, who runs Microsoft's commercial business, framed the job as co-designing and continuously improving AI systems 'based on measurable business outcomes,' corporate-speak for: the chatbots did not obviously move the numbers, so here are humans to make them. Microsoft is selling it as platform-neutral, leaning on Accenture, PwC, and EY to scale. OpenAI's rival DeployCo runs on $4 billion and about 150 on-site engineers.

Read the source for Microsoft's $2.5B Frontier unit puts 6,000 engineers inside customers… · Microsoft · blogs.microsoft.com

AI Industry ·The Information

Tesla caps engineers at $200 a week of AI after some burned thousands

AnalysisTesla just put a $200-a-week ceiling on how much AI each employee can spend, effective July 6, after software engineers were routinely burning through thousands of dollars in model tokens a week. The company had rolled out an internal platform called Bottle Rocket offering models from OpenAI, Anthropic, xAI, and Cursor, and pushed staff toward Cursor's Composer and Elon Musk's Grok. Engineers reached for Anthropic's Claude instead. Only beta versions of xAI's own products escape the cap. The memo, reported by The Information, is a rare hard number on what heavy AI coding actually costs per head.

Read the source for Tesla caps engineers at $200 a week of AI after some burned thousands · The Information · theinformation.com

LLM Evals ·The Decoder

UK AI Security Institute: standard benchmarks lowball what agents can do

AnalysisGive an AI agent ten times the compute budget and it does markedly better work, which means most published benchmark scores understate what these systems can actually do. The UK's AI Security Institute (AISI), the government body that stress-tests frontier models, found that raising the token budget from 1 million to 10 million lifted success on software-engineering tasks by about 25%. In cybersecurity, roughly 8% of tasks were only solvable above 10 million tokens, some needing 50 million. At those budgets, model capability doubled every 40 to 50 days, against 67 to 91 days at a fixed low cap.

Read the source for UK AI Security Institute: standard benchmarks lowball what agents can… · The Decoder · the-decoder.com

AI Industry ·Reuters

Zuckerberg tells staff Meta's AI agent bet 'hasn't come to fruition' after four months

AnalysisMeta reorganized the entire company around AI agents this spring, and Mark Zuckerberg just told an internal town hall it 'hasn't really accelerated in the way that we expected' over the past four months. The reorganization was not free: Meta cut roughly 10% of its global staff in May and moved about 7,000 people onto AI teams, while committing up to $145 billion to AI infrastructure this year. Zuckerberg now says he expects tangible results in three to six months, the kind of timeline that resets when it lapses. Reuters obtained audio of the meeting.

Read the source for Zuckerberg tells staff Meta's AI agent bet 'hasn't come to fruition'… · Reuters · reuters.com

AI Industry ·The Wall Street Journal

Kuaishou's Kling raises $2B at an $18B valuation ahead of a Hong Kong IPO

AnalysisKling, the AI video arm of Chinese social-video company Kuaishou, raised $2.04 billion (13.82 billion yuan) at an $18 billion valuation, with room to reach $3 billion, and plans to spin off and list in Hong Kong. Tencent, Citic Securities, and several state-linked funds joined the round; Kuaishou would keep about 68% if more investors pile in. The money is a bet that AI-generated video becomes a real business before rivals lock it up. Kling names Google's Veo, Runway, and ByteDance's Seedance as the field. By its own account, it is still early on making money.

Read the source for Kuaishou's Kling raises $2B at an $18B valuation ahead of a Hong Kong… · The Wall Street Journal · wsj.com

AI Industry ·The Information

Anthropic explores its own Samsung-made AI chip while insisting Nvidia still matters

AnalysisAnthropic is in early talks with Samsung to build a custom AI chip of its own, according to The Information, even as it publicly insists that silicon from Nvidia, Google, and Amazon stays 'central to its strategy.' There is no design yet, no stated power target, and Anthropic will not discuss a roadmap. What it does have is Clive Chan, a chip engineer poached from the custom-silicon teams at both Tesla and OpenAI, hired to build out a dedicated group. The pattern is familiar: every lab paying Nvidia's margins eventually starts drawing its own chips to claw some back.

Read the source for Anthropic explores its own Samsung-made AI chip while insisting Nvidi… · The Information · theinformation.com

AI Models ·Business Insider

Meta's 'Watermelon' model catches GPT-5.5, but burns 10x the compute to do it

AnalysisMeta's next training run, code-named Watermelon, has pulled level with OpenAI's GPT-5.5, its AI chief Alexandr Wang told staff, with one heavy asterisk: it took 'an order of magnitude more compute' than Meta's prior model, Avocado. Reaching parity on ten times the compute is an expensive way to tie. Meta is buying the same finish line rivals reached more cheaply, which is a strange thing to present internally as progress. The claim landed the same week Zuckerberg conceded Meta's agent push had stalled, and Wang spent part of it on X doing damage control.

Read the source for Meta's 'Watermelon' model catches GPT-5.5, but burns 10x the compute… · Business Insider · businessinsider.com

AI Cost Squeeze | AI Field Notes #61

Mistral Leanstral 1.5: an open proof model that solved 587 Putnam problems

AI bug-hunters logged 1,500 high-severity CVEs in June, 3.5x the human record

Claude Code's China problem: Anthropic locks the door, Alibaba tells staff to delete it

Microsoft Copilot overhaul: one app, background AutoPilot agents, features cut

Microsoft's $2.5B Frontier unit puts 6,000 engineers inside customers to prove ROI

Tesla caps engineers at $200 a week of AI after some burned thousands

UK AI Security Institute: standard benchmarks lowball what agents can do

Zuckerberg tells staff Meta's AI agent bet 'hasn't come to fruition' after four months

Kuaishou's Kling raises $2B at an $18B valuation ahead of a Hong Kong IPO

Anthropic explores its own Samsung-made AI chip while insisting Nvidia still matters

Meta's 'Watermelon' model catches GPT-5.5, but burns 10x the compute to do it

Get AI Field Notes by email.