Devlery

Blog

Notes and analysis on AI development.

Hugging Face TITO Warns Agentic RL Teams About Token Drift

Hugging Face TITO Warns Agentic RL Teams About Token Drift

Hugging Face explains how retokenizing tool-using agent rollouts can break gradients, and proposes TITO as a safer training-loop rule.

IBM Agentic CLEAR tracks agent failures across three levels

IBM Agentic CLEAR tracks agent failures across three levels

IBM Research released the Agentic CLEAR paper and open source tool for analyzing agent traces at the system, trace, and node levels.

CoreWeave turns agent training and inference into one loop

CoreWeave turns agent training and inference into one loop

CoreWeave’s new W&B-integrated agentic AI platform ties Serverless RL, inference, Weave observability, Skills, and MCP into one operations loop.

Claude self-hosted sandboxes set new rules for private MCP access

Claude self-hosted sandboxes set new rules for private MCP access

Anthropic added self-hosted sandboxes and MCP tunnels to Claude Managed Agents, shifting tool execution and private tool access into enterprise-controlled boundaries.

Workday ASOR and Gemini move HR agents to the approval line

Workday ASOR and Gemini move HR agents to the approval line

Workday and Google Cloud connected Sana to Gemini Enterprise. For HR and finance agents, approval chains, permissions, and data boundaries matter more than the model.

Anthropic’s $65B Series H Turns Claude Into a Compute Race

Anthropic’s $65B Series H Turns Claude Into a Compute Race

Anthropic’s $65B Series H puts Claude demand, a $96.5B valuation, $47B revenue run rate, and AWS, Google, and SpaceX compute into one story.

OpenAI shows how Codex became an engineering backlog system

OpenAI shows how Codex became an engineering backlog system

OpenAI published an internal Codex usage report. The practical signal is task queues, AGENTS.md, repo questions, migrations, tests, and incident triage.

Copilot API now grades AI adoption by user phase

Copilot API now grades AI adoption by user phase

GitHub Copilot usage metrics now classify users by code-first, agent-first, and multi-agent usage over a 28-day window.

Chrome Enterprise MCP turns browser security policy into agent tools

Chrome Enterprise MCP turns browser security policy into agent tools

Google released a Chrome Enterprise Premium MCP server that exposes DLP rules, connector policy, browser telemetry, and activity logs to AI agents.

Claude containment design exposes 24 AWS credential leaks

Claude containment design exposes 24 AWS credential leaks

Anthropic published Claude containment designs and failure cases across claude.ai, Claude Code, and Claude Cowork, turning approval fatigue, allowlists, and memory into an agent security checklist.

Robinhood opens MCP trading, but agent losses stay with users

Robinhood opens MCP trading, but agent losses stay with users

Robinhood opened Trading MCP and Banking MCP for AI agents. The real developer story is the permission, approval, and liability model around financial tool calls.

OpenAI launches Rosalind Biodefense as a trusted-access test for GPT-Rosalind

OpenAI launches Rosalind Biodefense as a trusted-access test for GPT-Rosalind

OpenAI has launched Rosalind Biodefense, pairing GPT-Rosalind, Codex life-science tooling, and trusted access for public-health defense work.