AI News | Field Notes by Michael Nemtsev

Copilot Token Billing Shift | AI Field Notes #28

A figure at a terminal under a ticking cost meter as coworkers file out behind them, suggesting AI's simultaneous billing shift and workforce cuts.

GitHub Copilot moves to AI Credits billing on June 1, ending flat-rate pricing for developers running agent workflows at scale. Google I/O opened today with Gemini 3.1 Flash-Lite at $0.25 per million tokens, Android XR smart glasses, and Aluminium OS, a new ChromeOS replacement. A single week in May brought roughly 7,500 explicitly AI-justified job cuts at Cloudflare, PayPal, Coinbase, and Upwork. A Gartner survey of 350 executives found no correlation between those cuts and improved financial returns. Coverage in today's issue draws on the past week.

AI ModelsAI Industry ·Android Central

Google I/O 2026: Gemini 3.1 Flash-Lite, XR glasses, and Aluminium OS

AnalysisGoogle's I/O developer conference opened May 19 with a Gemini-forward lineup. Gemini 3.1 Flash-Lite delivers output roughly 2.5 times faster than earlier versions, priced at $0.25 per million input tokens, putting it in cost range for high-frequency agent tasks. Android XR smart glasses, built with Samsung and Warby Parker, embed Gemini 2.5 Pro as the on-device assistant. Aluminium OS, Google's Android-based replacement for ChromeOS, also launched. A larger Gemini 4.0 was widely expected at the keynote; confirmation details were still arriving as this issue went out. The organizing theme across hardware and software is moving Gemini from chat interface to operating system layer.

AI AgentsLLM Evals ·Digital Watch Observatory

Microsoft MDASH: agentic AI system finds 16 Windows vulnerabilities, zero false positives

AnalysisMicrosoft announced May 12 that MDASH, a multi-model agentic security system using over 100 specialized AI agents, found 16 previously unknown vulnerabilities across Windows components including the TCP/IP stack, IKEv2 authentication services, DNS handling, and Netlogon. Several were reachable over the network without authentication. In private testing, MDASH hit a 100% detection rate with zero false positives on deliberately inserted bugs, and 96% recall on five years of confirmed Microsoft Security Response Center cases in the clfs.sys driver. Security tooling running at that recall rate across a component's full history changes the calculus for what a small security team can realistically audit.

LLM EvalsAI Industry ·Build Fast With AI

US Commerce Department: all five frontier AI labs now under pre-deployment review

AnalysisThe US Commerce Department's Center for AI Safety and Innovation (CAISI) finalized pre-deployment evaluation agreements with OpenAI, Anthropic, Google DeepMind, Microsoft, and xAI. Every major model release from those labs now goes through a government safety review before public launch. The agreements operate under the December 2025 executive order directing a minimally burdensome national AI policy framework. The evaluations are advisory rather than blocking, so they do not create an approval gate. A mandatory review pipeline covering every frontier lab simultaneously is still structurally different from the voluntary testing commitments those labs were making under the prior administration.

LLM Evals ·National CIO Review

AISI: AI cyber capability doubling every 4.7 months, evaluation frameworks falling behind

AnalysisThe UK AI Safety Institute (AISI) estimated in late 2025 that AI cyber task performance was doubling every 8 months. By February 2026, that figure had compressed to 4.7 months. Claude Mythos Preview, an Anthropic research model, became the first AI system to fully complete both of AISI's simulated enterprise attack environments. AISI acknowledged its own framework is becoming inadequate: without token caps, success rates are high enough that the standard 'time horizon' metric breaks down as a meaningful measure. The practical shift is that AI-assisted penetration testing and vulnerability discovery are crossing real-world capability thresholds faster than the labs building the evaluation tools anticipated.

AI IndustryAI Models ·TechCrunch

ChatGPT personal finance: bank accounts via Plaid, 12,000 institutions, Pro users first

AnalysisOpenAI launched a personal finance preview for ChatGPT Pro users in the US on May 15, connecting to more than 12,000 financial institutions through Plaid (a bank-linking service), including Chase, Fidelity, Schwab, and Robinhood. Users can analyze spending, track portfolios, monitor subscriptions, and build plans against their actual account data. GPT-5.5 handles the financial conversations. The feature cannot view full account numbers or execute transactions. Intuit support is planned, which would add tax impact analysis. OpenAI is moving from AI conversation tool to financial services access layer, a positioning shift that changes both its compliance exposure and its relationship with the banks whose data it now reads.

AI Agents ·Vercel Blog

Vercel AI SDK 6: Agent interface, DurableAgent, full MCP support, and DevTools

AnalysisVercel released AI SDK 6 with a formal Agent interface, DurableAgent for resumable workflow steps, full Model Context Protocol (MCP) support, a DevTools browser panel for inspecting agent state, reranking, and image editing. The SDK has roughly 20 million monthly npm downloads and over 22,200 GitHub stars, making it the most widely used AI framework in the JavaScript ecosystem. DurableAgent addresses one of the persistent problems in production agents: workflows that need to pause, resume across restarts, and skip already-completed steps without re-executing them. Full MCP support means SDK-built agents can now connect to any of the 9,400-plus public MCP servers without custom integration code.

Subscribe for full archive access

Every past issue, weekly deep dives, and the full back catalogue — delivered free.

Read on Substack

Want this in your inbox?

One email a day, zero hype.

A short read every morning: what actually changed in AI, and what it means for work and daily life. Free, unsubscribe anytime.