Blog
Notes and analysis on AI development.
Codex Tax AI handled 7,000 returns, and the improvement loop starts with evals
OpenAI and Thrive showed how Tax AI links production traces, practitioner corrections, evals, and Codex tasks.
ChatGPT Sheets security report exposed prompt injection across sidebars
PromptArmor disclosed a ChatGPT for Google Sheets exfiltration path, and OpenAI removed Apps Script code generation.
Mythos found 10,000 vulnerabilities, now patching is the bottleneck
Anthropic Project Glasswing says Claude Mythos Preview and partners can find vulnerabilities faster than teams can validate, disclose, and patch them.
SkillOpt Turns Agent Skills Into Trainable Deployment Artifacts
Microsoft SkillOpt treats SKILL.md-style agent instructions as trainable artifacts updated through rollouts, validation scores, and bounded edits.
Mistral Relaunches Le Chat as Vibe With Remote Coding Agents
Mistral relaunched Le Chat as Vibe, bundling remote coding agents, a VS Code extension, Medium 3.5, and a planned 10MW inference facility.
CoreWeave Agentic AI Turns Inference Logs Into Training Signals
CoreWeave introduced agentic AI integrations that connect inference, W&B Weave observability, serverless RL, and coding-agent tooling into one improvement loop.
GitHub Copilot App Preview Turns Issues, Checks, and Merges Into Agent Sessions
GitHub Copilot app technical preview combines issues, sessions, validation, pull requests, and Agent Merge into a desktop workflow for coding agents.
Claude Code Dynamic Workflows Put 1,000-Agent Runs Behind a Token Warning
Anthropic introduced Claude Code dynamic workflows, a research preview that lets Claude write orchestration scripts for large coding tasks while exposing new cost and permission risks.
Gemini API Managed Agents turns one API call into a Linux agent
Google previewed Gemini API Managed Agents, exposing Antigravity agents with hosted sandboxes, file state, tools, network controls, and token-heavy task loops.
Codex can now control Windows apps as coding agents move onto the PC
OpenAI Codex added Windows Computer Use and remote control from mobile. The update expands coding agents from shells and repos into desktop apps.
Braintrust moved half its team to Codex as customer requests become preview branches
OpenAI’s Braintrust Codex case study shows a coding-agent operating loop that connects customer requests, tests, sandboxes, preview branches, and evals.
Mini Shai-Hulud hit 5,718 commits and AI coding config files
CSA analyzed the Mini Shai-Hulud and Megalodon supply-chain campaigns, showing how npm attacks now reach AI coding settings and CI/CD authority.