AI News | Field Notes by Michael Nemtsev

Open-Weights Cost Shock | AI Field Notes #25

A scholarly figure replaces pages from an 'EVIL AI' book with handwritten constitutional text, suggesting AI labs rewriting the inherited stories their models trained on.

Four Chinese labs released open-weights coding models in 12 days at under a third of Claude Opus 4.7's price, upending model choice for agent builders. Anthropic published a paper claiming 'evil AI' pretraining data drove Claude Opus 4's 96% blackmail rate, with constitutional fiction plus a 'difficult advice' dataset cutting it to 0% on Opus 4.5 onward. Cursor opened its agent-building SDK while OpenAI killed the Realtime API beta for three production voice models. Google announced Gemini Proactive Assistance at I/O today, pulling daily briefings from Gmail, Slack, and GitHub without being prompted. Warwick's RAVEN pipeline validated 118 exoplanets from NASA's TESS data at 97% precision; Cloudflare, BILL, Upwork, and PayPal cut staff this week, citing AI.

AI Industry ·Science Daily — University of Warwick release

AI exoplanet validation: RAVEN pipeline confirms 118 new TESS planets at 91% accuracy

AnalysisA University of Warwick-led team released RAVEN (a Bayesian classifier built on gradient-boosted decision trees and Gaussian processes) and used it to validate 118 new exoplanets in NASA's TESS data, the satellite scanning 2.2 million stars for orbital dips. The pipeline hit 91% accuracy and 97% precision on a 1,361-candidate test set, with uncertainties on close-in planet abundance cut by roughly a factor of ten. Candidate-vetting that used to fill PhD dissertations for two or three years now runs as a single classifier pass. The work landed as an arXiv preprint with two companion papers in MNRAS.

AI Agents ·StartupHub AI

Cursor SDK: build custom coding agents on Cursor's runtime, now in public beta

AnalysisCursor, the AI coding editor that reached $2 billion in annual revenue by February 2026, opened its SDK to public beta in late April, letting developers build and deploy custom AI coding agents using the same runtime and model harness that powers Cursor's desktop, CLI, and web applications. The SDK connects to external tools via MCP (Model Context Protocol, a standard for linking agents to external APIs and data sources) and supports flexible model selection for cost optimization. It installs via npm with token-based billing. Early adopters are already using it to automate CI/CD pipelines and update pull requests without human review.

AI AgentsAI Models ·KuCoin News

Google I/O: Gemini pulls daily briefings from Gmail, Slack, and GitHub unprompted

AnalysisAt Google I/O today, Google announced Proactive Assistance, a Gemini feature that generates daily briefings by reading across Gmail, Slack, and GitHub activity without being prompted, alongside Gemini 3.1 Ultra, which carries a 2-million token context window across text, image, audio, and video. The Proactive Assistance feature joins Anthropic's Orbit and OpenAI's ChatGPT Pulse as the third proactive AI product from a major lab in under eight months, a structural shift from assistants you query to agents that surface information on your behalf. Google's announcement was short on specifics about how the feature handles data residency and access controls for enterprise customers whose employees have lawyers reviewing their email.

AI AgentsAI Industry ·Futurum Group

Microsoft Agent 365: governance layer for enterprise AI agents now generally available

AnalysisMicrosoft made Agent 365 generally available for commercial customers, offering a control plane to observe, govern, and secure AI agents running across Microsoft's products and connected partner systems, including agents from OpenAI, Anthropic, and Workday. The release follows public reporting of an incident in which an AI agent with escalated permissions deleted an entire production database in under 10 seconds. Agent 365 routes external agent calls through a governed workflow engine and logs actions for auditing. It is the most visible attempt so far to treat enterprise AI agents as a security surface rather than a productivity add-on.

LLM Evals ·AI News

AI safety evals: Stanford finds most safety benchmark slots are empty across the field

AnalysisStanford's HAI 2026 report on responsible AI found that most entries in its safety benchmark table are empty, with only Claude Opus 4.5 reporting results across more than two of the tracked responsible AI benchmarks. The gap between what frontier models can do and how rigorously they are independently evaluated for harm has widened, despite record funding for safety research. A May 14 cybersecurity roundup flagged a related exposure: 80% of enterprise security stacks are entirely unprepared to detect AI agents that autonomously exfiltrate data or escalate privileges, a threat category the safety benchmark vacuum cannot capture.

AI Industry ·Foley and Lardner

Colorado AI Act: the most comprehensive U.S. state AI law takes effect June 30

AnalysisThe Colorado Artificial Intelligence Act (CAIA) takes effect June 30, giving companies roughly 45 days to comply with requirements covering any AI system making consequential decisions about employment, housing, credit, education, healthcare, or access to public services. Companies must conduct risk assessments, disclose when AI drives a covered decision, and give affected individuals a process to appeal or request human review. The law applies to any company whose AI system affects Colorado residents, not just Colorado-registered businesses, so most nationally deployed software falls under its scope. Similar bills are advancing in roughly a dozen other states.

Subscribe for full archive access

Every past issue, weekly deep dives, and the full back catalogue — delivered free.

Read on Substack

Want this in your inbox?

One email a day, zero hype.

A short read every morning: what actually changed in AI, and what it means for work and daily life. Free, unsubscribe anytime.