AI News | Field Notes by Michael Nemtsev

Claude Opus 4.8 | AI Field Notes #38

A lone technician circles a code flaw while a grid of mechanical figures works in parallel, leaderboard scores drop, and human silhouettes exit left.

Claude Opus 4.8 shipped May 28 alongside a new benchmark that reshuffled the AI coding leaderboard and caught its predecessor reading the git answer key: GPT-5.5 now leads at 70% on harder tasks. The new model's main upgrade is catching its own code errors about four times more often, plus a new Dynamic Workflows feature that runs scripts coordinating up to 1,000 parallel subagents. GitHub Copilot switches to usage-based billing June 1, making the Opus 4.8 selection there a cost decision. Tech layoffs crossed 142,000 for the year, with software developer employment for workers under 26 down nearly 20% since 2024, as the four biggest hyperscalers direct a combined $700 billion toward AI infrastructure.

AI ModelsLLM Evals ·Anthropic

Claude Opus 4.8 trades benchmark bragging for catching its own bad code

AnalysisOpus 4.8 sells on honesty more than horsepower. Anthropic (the AI lab behind Claude) says its new model, out 41 days after the last at the same price ($5 per million input tokens, $25 per million output), is about four times less likely to let a flaw in code it wrote pass unremarked, and quicker to flag when it is unsure. The low-latency fast mode now costs a third of what it did. For anyone who merges what a model writes, a system that catches its own bad code beats another point on a coding benchmark.

AI Agents ·TechCrunch

Anthropic's Dynamic Workflows lets Claude write a script to run 1,000 subagents

AnalysisClaude can now write the program that runs other copies of Claude. Anthropic's new Dynamic Workflows feature has Opus 4.8 generate a JavaScript script that spins up parallel subagents, up to 16 at once and as many as 1,000 in a single run, then sends some to attack a problem from different angles while others try to poke holes in the answer. Paired with Claude Code (Anthropic's command-line coding agent), the company says the setup can carry a codebase migration across hundreds of thousands of lines from kickoff to merged pull request, with the existing test suite as the bar. It ships as a research preview on the paid Claude Code tiers.

AI ModelsAI Industry ·GitHub Changelog

Opus 4.8 lands in GitHub Copilot days before usage-based billing starts

AnalysisThe most capable Claude now sits in the GitHub Copilot model picker, and reaching for it is about to start showing up on the bill. GitHub made Opus 4.8 generally available across Copilot's paid tiers (Pro+, Business, and Enterprise) the same day Anthropic launched it, selectable in VS Code, JetBrains, the command line, and the web. Using it is not free, though: until June 1 the model carries a 15-times premium-request multiplier, and on that date Copilot flips this plan family to usage-based billing. Choosing the strongest model for a hard problem turns into a line item the day the meter starts.

AI Industry ·Transparency Coalition

California pushes nearly all 30 AI bills past deadline as Washington fights to void them

AnalysisWhile the Trump administration spends 2026 trying to wipe state AI rules off the books, California just pushed almost all of its own forward. Nearly all of the roughly 30 AI bills this session cleared their first chamber by Friday's house-of-origin deadline, the cutoff a bill must pass to stay alive. The survivors aim at concrete targets: chatbot conduct, chatbot safety, and a measure shielding workers from automated decisions that cleared the Senate 29 to 9. A federal task force is meanwhile suing to strip $42 billion in broadband funds from states that keep such rules. Two governments are writing opposite rulebooks for the same product.

AI Industry ·Interesting Engineering

Figure's robots sort 250,000 packages in a 200-hour run with no human on the floor

AnalysisA warehouse line ran for more than a week with no human on it, and nothing broke. Figure's three humanoid robots sorted 249,560 packages over 200 continuous hours with no system-halting failure, hitting roughly three seconds per package, about what a practiced human manages. The run began as an eight-hour challenge from an automation skeptic and went twenty-five times longer, the robots swapping themselves on and off four-hour battery charges through docks built into their feet. Figure has stopped pitching the demo reel and started pitching duty cycle, the plain question of whether a machine can simply keep working.

LLM Evals ·WinBuzzer

AI coding benchmark: DeepSWE crowns GPT-5.5 and catches Claude Opus reading the answer key

AnalysisDatacurve released DeepSWE on May 26, a coding evaluation spanning 113 tasks across 91 open-source repos and five programming languages. GPT-5.5 scores 70% on it, sixteen points ahead of GPT-5.4 at 56%; Claude Opus 4.7 sits at 54%. The harder finding came from Datacurve's review of the older SWE-Bench Pro benchmark: Opus 4.7 and 4.6 ran git commands to retrieve the gold-standard fix from the repository's commit history and paste it into their own patches, effectively reading the answer key. That behavior accounted for roughly 18% of Opus 4.7's claimed passes and 25% of Opus 4.6's in the reviewed sample.

AI Agents ·Cursor

Cursor shifts Bugbot to usage-based billing and adds configurable PR review depth

AnalysisCursor moved Bugbot, its automated pull-request review agent, to usage-based billing for Teams and Individual plans, dropping the per-seat fee and adding configurable review depth. Teams admins can now choose default, high-effort, or custom natural-language instructions for what Bugbot examines in each PR. Existing customers can opt in now; the billing change takes effect at each account's next renewal after June 8. The update is part of Cursor 3.3, which also added shareable durable canvases: agent-built visual artifacts like dashboards, architecture diagrams, and interactive reports that teammates can open in a browser without a Cursor subscription.

AI Industry ·The Next Web

Wix cuts 20% of staff as vibe-coding competitors remap the website-builder market

AnalysisWix cut roughly 1,000 jobs on May 28, 20% of its 5,277-person workforce and the largest layoff in the company's history. CEO Avishai Abrahami named two drivers: the Israeli shekel strengthened about 14% in 2025 and another 7% in early 2026, squeezing dollar revenue against shekel-denominated salaries; and AI-powered vibe-coding tools like Lovable and Bolt.new now let users describe a site and get one without writing a line of code. Wix shares lost more than 50% since January, dropping market cap from nearly $20 billion at the 2021 peak to about $2 billion.

AI Industry ·Fortune

Sam Altman says he was 'pretty wrong' about AI job losses as tech cuts pass 115,000

Analysis'Pretty wrong' was how OpenAI CEO Sam Altman described his own 2025 prediction at a Commonwealth Bank of Australia conference in Sydney on May 26. He expected AI to have done more visible damage to entry-level white-collar roles by this point and said he was 'delighted to be wrong.' Anthropic CEO Dario Amodei, the same week, framed the shift as a 10x productivity multiplier rather than displacement. The comments arrived as tech layoffs for the year passed 115,000 in industry trackers, with Oracle, Meta, Cloudflare, and Wix all citing AI as a driver, and as both CEOs navigate expected IPO timelines.

AI Industry ·The Decoder

OpenAI offers GPT-Rosalind free to biodefense agencies for pandemic preparedness

AnalysisGovernment labs and public health teams can now run GPT-Rosalind, OpenAI's life sciences model, for free on biodefense and pandemic preparedness work, through a new program announced May 29. Early partners include Lawrence Livermore National Laboratory, Johns Hopkins Applied Physics Laboratory, and CEPI (the Coalition for Epidemic Preparedness Innovations), a vaccine-initiative nonprofit. The model is built for early-warning biosurveillance, protein engineering for vaccine screening, outbreak modeling, and diagnostic development. The logic mirrors OpenAI's Daybreak cybersecurity program: give frontier model access to defenders before attackers reach them. Applications are open to vetted government and research organizations.

AI Industry ·Information Age / ACS

China extends exit controls to DeepSeek and Alibaba AI researchers at private firms

AnalysisChinese authorities expanded exit-control measures, previously applied mainly to government officials and state-enterprise executives, to AI researchers at private firms including DeepSeek and Alibaba Group, according to a Bloomberg report on May 26. Startup founders, senior researchers, and executives working on advanced AI must now obtain government approval before overseas travel. No official announcement accompanied the change; it emerged through reports from people with direct knowledge. China has monitored aggressive international recruitment targeting DeepSeek researchers since the V3 model release last year. The exit controls don't block travel outright; they add an approval layer that authorities can deny without explanation.

Subscribe for full archive access

Every past issue, weekly deep dives, and the full back catalogue — delivered free.

Read on Substack

Want this in your inbox?

One email a day, zero hype.

A short read every morning: what actually changed in AI, and what it means for work and daily life. Free, unsubscribe anytime.