AI

495 posts

AI Agent AI Infrastructure Developer Tools AI Coding LLM Security AI News MCP

The bill behind 73% success, agent evaluation moves beyond models

IBM Research and Hugging Face’s Open Agent Leaderboard evaluates AI agents as systems, including harnesses, costs, and failure modes.

May 20, 2026

Overeager Coding Agents Put Permission Boundaries on the Benchmark

OverEager-Bench measures whether coding agents cross the user’s authorized scope during benign tasks, using 500 scenarios and roughly 7,500 runs.

May 20, 2026

Command A+ on two H100s, and the cost threshold for sovereign AI

Cohere Command A+ is an Apache 2.0 open model aimed at enterprise agents, private deployment, and the practical cost of sovereign AI.

May 20, 2026

Why Qwen3.7 is pairing 35-hour agents with custom chips

Alibaba Qwen3.7-Max is not just a model launch. It packages agents, custom chips, 128-accelerator racks, and cloud runtime into one stack.

May 20, 2026

Genie swallowed Street View, and maps are the world-model bottleneck

Google added Street View grounding to Project Genie. The world-model race is moving from prompts toward real spatial data and responsibility boundaries.

May 20, 2026

Two AI Scientist Papers in Nature, and the Lab Bottleneck Is Still Human

Nature published Google DeepMind Co-Scientist and FutureHouse Robin together. Research automation is moving from model demos to verified agent loops.

May 20, 2026

Cohere buys Reliant AI as sovereign AI moves into pharma literature

Cohere’s Reliant AI acquisition shows enterprise AI shifting from general chatbots toward regulated industry agents, evidence tracking, and data sovereignty.

May 20, 2026

Agent Timeline Turns Agent Failures Into Traceable Evidence

Honeycomb Agent Observability tries to reconstruct LLM calls, tool use, agent handoffs, and downstream systems as one traceable production event.

May 20, 2026

Grok Build Beta Puts xAI Into the Coding Agent War

xAI Grok Build early beta enters coding agents with a terminal UI, headless execution, ACP, and Claude Code compatibility behind a $300 tier.

May 20, 2026

Why Anthropic bought the SDK plumbing behind Claude

Anthropic’s Stainless acquisition shows the agent race moving from model quality into SDKs, MCP servers, and the API plumbing agents need to act.

May 20, 2026

Full repo scanning, the SAST gap AWS Security Agent is targeting

AWS Security Agent full repository code review targets trust boundaries and data flows that traditional SAST often misses.

May 20, 2026

Zero 0.1.3 Turns Compiler Diagnostics Into an Agent API

Vercel Labs Zero is less about new syntax than JSON diagnostics, stable error codes, and typed repair metadata for coding agents.

May 20, 2026