AI Coding Agents Expand | AI Field Notes #46

Empty office chairs give way to a mechanical hand finishing the paperwork, suggesting AI agents absorbing the desk work that once needed people.

AI coding agents pushed past developers this week, the same stretch that AI quietly automated the research analyst's job. OpenAI's Codex shipped role plugins for analysts and bankers, Microsoft moved agent governance into Windows at Build, and Google let developers write model evals locally from VS Code. AlphaSense doubled to a $7.5 billion valuation selling document-reading as a feature, while DeepSeek raised $7.4 billion to keep the price war funded. Anthropic warned the industry has no 'brake pedal' for self-improving AI, and the Pentagon began dropping Claude for refusing to loosen its guardrails. A quieter news week, so this issue draws on the last 72 hours.

AI Agents LLM Evals AI Models AI Industry

Latest issue · About

AI Agents ·OpenAI

OpenAI Codex goes beyond developers: six role plugins for analysts and bankers

AnalysisNon-developers already make up about 20% of Codex's 5 million weekly users, and they are growing three times faster than the engineers it was built for. On June 2 OpenAI shipped Codex for every role: six role-specific plugins spanning data analytics, sales, product design, and investment banking, bundling 62 apps and 110 skills, plus a preview of Codex Sites that lets the agent build and host interactive web apps behind a private workspace link. The coding agent that scaffolded software now drafts Snowflake dashboards and banker decks. OpenAI is quietly repositioning Codex as office software.

Read the source for OpenAI Codex goes beyond developers: six role plugins for analysts an… · OpenAI · openai.com

LLM Evals ·Google

Kaggle Benchmarks goes local: write AI evals from VS Code and Cursor, not a web notebook

AnalysisBuilding a custom evaluation for an AI model used to mean working inside Kaggle's web notebook, away from your real tools. Google changed that in early June: Kaggle Benchmarks now runs locally, so a developer can create, validate, and push eval tasks straight from VS Code, Cursor, or a coding agent, using a write-kaggle-benchmarks skill that teaches the agent the SDK. The community has already built more than 10,000 eval tasks. Evals, the tests that decide which model you trust for a job, are moving from a research chore into everyday dev work, where they belong.

Read the source for Kaggle Benchmarks goes local: write AI evals from VS Code and Cursor,… · Google · blog.google

LLM Evals ·CNN Business

Anthropic warns AI is nearing self-improvement, asks the industry to build a 'brake pedal'

AnalysisAnthropic's policy team says the moment when AI systems can improve themselves without humans in the loop is close, and the industry has no way to stop it if that goes wrong. In a June 5 piece, co-founder Jack Clark argued the field has a gas pedal but no brake, and called for shared safeguards to pause development, comparing the task to Cold War arms-control deals struck between rivals. His evidence comes from inside the building: Claude already writes 80% of Anthropic's own code, a share Clark expects to reach 100% within a couple of years. The firm racing ahead is also the one asking everyone to agree where the brake goes.

Read the source for Anthropic warns AI is nearing self-improvement, asks the industry to… · CNN Business · cnn.com

AI Models ·Techgenyz

GPT-Rosalind: OpenAI's life-sciences model cuts genomics compute 31%, gates biodefense access

AnalysisOpenAI's specialized life-sciences model now does more genomics reasoning for less compute. The June 3 update to GPT-Rosalind pairs GPT-5.5's coding and tool use with drug-discovery skills, edging GPT-5.5 on GeneBench (a test of long-horizon genomics tasks) at 21.6% against 20.4% while burning 31% fewer tokens. Novo Nordisk, the drugmaker behind Ozempic, gets early access. Days earlier OpenAI launched Rosalind Biodefense, handing vetted government agencies and allies a model for pandemic work. The same capability that designs a drug can design a pathogen, so OpenAI is gating who holds it rather than shipping it open.

Read the source for GPT-Rosalind: OpenAI's life-sciences model cuts genomics compute 31%,… · Techgenyz · techgenyz.com

AI Agents ·Microsoft Security Blog

Microsoft Agent 365 SDK hits GA with OS-level containment for AI agents at Build 2026

AnalysisMicrosoft used Build 2026 to push agent governance down into the operating system. The Agent 365 SDK, now generally available, lets a developer bake observability, access controls, and compliance checks into any agent on any AI platform, which then plugs into a management layer that already spans Microsoft's 11,000-model Foundry catalog. Underneath sits the Execution Container SDK on Windows, isolating what an agent can touch through process and session boundaries, plus Purview runtime data-loss prevention that blocks sensitive data before it reaches a model. The pitch is blunt: autonomous agents are a security problem, and the company that owns the OS wants to own the controls.

Read the source for Microsoft Agent 365 SDK hits GA with OS-level containment for AI agen… · Microsoft Security Blog · microsoft.com

AI Industry ·Fox Business

Bernie Sanders wants the public to own half of OpenAI and xAI, and Trump is warming to equity

AnalysisA 50% public stake in the biggest AI companies went from fringe to bipartisan in a week. Senator Bernie Sanders introduced the American AI Sovereign Wealth Fund Act, which would force OpenAI, Anthropic, and xAI to hand half their stock to a public fund that pays Americans a dividend, on the argument that the models were trained on work the public created for free. Days later President Trump called government equity in AI giants 'a beautiful thing,' and his officials opened talks with the firms. The Cato Institute counts 20 companies the White House already holds stakes in. Left and right now agree the state should own a piece.

Read the source for Bernie Sanders wants the public to own half of OpenAI and xAI, and Tr… · Fox Business · foxbusiness.com

AI Industry ·Bloomberg

Pentagon tests OpenAI, Google, and Grok to replace Claude on classified systems

AnalysisThe Defense Department is running OpenAI, Google, and xAI's Grok against Anthropic's Claude on classified networks, with a six-month plan to wind down its reliance on the company. The trigger: Anthropic refused to drop guardrails barring mass surveillance and autonomous weapons, after which Defense Secretary Pete Hegseth labeled it a supply-chain risk. Claude had been the primary model inside Maven Smart System, the Pentagon's classified targeting platform. Anthropic is fighting the designation in court, warning it could cost billions in revenue. The lab that marketed safety as its edge just learned the largest customer treats safety as a defect.

Read the source for Pentagon tests OpenAI, Google, and Grok to replace Claude on classifi… · Bloomberg · bloomberg.com

AI Industry ·AlphaSense

AlphaSense doubles to a $7.5B valuation as AI eats the market-research analyst's job

AnalysisAlphaSense, an AI platform that reads filings, transcripts, and broker notes so analysts do not have to, raised $350 million at a $7.5 billion valuation, nearly double its $4 billion mark from last year. Annual recurring revenue crossed $600 million, up from $500 million in October, and Accenture signed on as its first channel partner to resell the tool to corporate research teams. The money tracks the use case: skimming hundreds of documents for the one number that moves a thesis is now a single query. Demand for the software climbs as demand for the junior who used to do that reading falls.

Read the source for AlphaSense doubles to a $7.5B valuation as AI eats the market-researc… · AlphaSense · alpha-sense.com

AI Industry ·Yahoo Finance (Reuters)

DeepSeek raises $7.4B in its first funding round at up to $59B, backed by Tencent and CATL

AnalysisDeepSeek, the Chinese lab whose cheap open-weight models forced a global price war, is taking outside money for the first time. The round runs to about 50 billion yuan ($7.4 billion), valuing the company between $52 and $59 billion, with founder Liang Wenfeng putting in 20 billion yuan of his own and Tencent and battery maker CATL among the largest backers. Until now DeepSeek ran on its founder's hedge-fund wealth and stayed deliberately lean. Taking strategic capital, including from China's national AI fund, ties a once-independent shop to Beijing's industrial machine. The open-weights underdog is becoming a national champion.

Read the source for DeepSeek raises $7.4B in its first funding round at up to $59B, backe… · Yahoo Finance (Reuters) · finance.yahoo.com