AI Field Notes by Michael Nemtsev

Evals

LLM evaluation: why demos lie and what to measure instead.

Evals Industry · 20 Apr 2026 ·asanify.com

EU's AI hiring rules go live in 105 days

If you are job hunting in Europe this summer, the algorithm reading your CV will start coming with paperwork attached. If you run hiring at a US company with European staff, the risk is not the fine, it is finding out in July that your applicant-tracking vendor cannot produce the audit trail. Better to ask now.

Evals Industry · 20 Apr 2026 ·nebraska.tv

Nebraska draws a line in fabricated citations

If your work goes to a regulator, a court, or a client who will verify it, pasting model output without checking just moved from embarrassing to career-ending. Build a verification step into your workflow this month. If you do not, someone else will build one around you, loudly, in public, and possibly with a suspension attached.

Evals Models · 19 Apr 2026 ·hai.stanford.edu

The US lead is now a rounding error

If you are an engineer picking a model, the 'American models are obviously better' default is gone. Test Qwen, GLM, DeepSeek on your actual workload before you assume you need GPT-5 or Claude. If you teach or hire juniors, note the 80% student-use number — your candidates' baseline toolset already includes AI, and your interview process probably does not reflect that.

Keep up daily

One email a day, zero hype.

Get Evals and the rest of the day's AI news in a short read every morning.