Blog

Notes and analysis on AI development.

Codex Tax AI handled 7,000 returns, and the improvement loop starts with evals

OpenAI and Thrive showed how Tax AI links production traces, practitioner corrections, evals, and Codex tasks.

June 1, 2026[AI]

ChatGPT Sheets security report exposed prompt injection across sidebars

PromptArmor disclosed a ChatGPT for Google Sheets exfiltration path, and OpenAI removed Apps Script code generation.

June 1, 2026[AI]

Mythos found 10,000 vulnerabilities, now patching is the bottleneck

Anthropic Project Glasswing says Claude Mythos Preview and partners can find vulnerabilities faster than teams can validate, disclose, and patch them.

June 1, 2026[AI]

SkillOpt Turns Agent Skills Into Trainable Deployment Artifacts

Microsoft SkillOpt treats SKILL.md-style agent instructions as trainable artifacts updated through rollouts, validation scores, and bounded edits.

June 1, 2026[AI]

Mistral Relaunches Le Chat as Vibe With Remote Coding Agents

Mistral relaunched Le Chat as Vibe, bundling remote coding agents, a VS Code extension, Medium 3.5, and a planned 10MW inference facility.

June 1, 2026[AI]

CoreWeave Agentic AI Turns Inference Logs Into Training Signals

CoreWeave introduced agentic AI integrations that connect inference, W&B Weave observability, serverless RL, and coding-agent tooling into one improvement loop.

June 1, 2026[AI]

GitHub Copilot App Preview Turns Issues, Checks, and Merges Into Agent Sessions

GitHub Copilot app technical preview combines issues, sessions, validation, pull requests, and Agent Merge into a desktop workflow for coding agents.

June 1, 2026[AI]

Claude Code Dynamic Workflows Put 1,000-Agent Runs Behind a Token Warning

Anthropic introduced Claude Code dynamic workflows, a research preview that lets Claude write orchestration scripts for large coding tasks while exposing new cost and permission risks.

June 1, 2026[AI]

Gemini API Managed Agents turns one API call into a Linux agent

Google previewed Gemini API Managed Agents, exposing Antigravity agents with hosted sandboxes, file state, tools, network controls, and token-heavy task loops.

May 31, 2026[AI]

Codex can now control Windows apps as coding agents move onto the PC

OpenAI Codex added Windows Computer Use and remote control from mobile. The update expands coding agents from shells and repos into desktop apps.

May 31, 2026[AI]

Braintrust moved half its team to Codex as customer requests become preview branches

OpenAI’s Braintrust Codex case study shows a coding-agent operating loop that connects customer requests, tests, sandboxes, preview branches, and evals.

May 31, 2026[AI]

Mini Shai-Hulud hit 5,718 commits and AI coding config files

CSA analyzed the Mini Shai-Hulud and Megalodon supply-chain campaigns, showing how npm attacks now reach AI coding settings and CI/CD authority.

May 31, 2026[AI]