Devlery - Page 29

Devlery blog

AI news for builders.

The bill behind 73% success, agent evaluation moves beyond models

IBM Research and Hugging Face’s Open Agent Leaderboard evaluates AI agents as systems, including harnesses, costs, and failure modes.

May 20, 2026[AI]

Overeager Coding Agents Put Permission Boundaries on the Benchmark

OverEager-Bench measures whether coding agents cross the user’s authorized scope during benign tasks, using 500 scenarios and roughly 7,500 runs.

May 20, 2026[AI]

Command A+ on two H100s, and the cost threshold for sovereign AI

Cohere Command A+ is an Apache 2.0 open model aimed at enterprise agents, private deployment, and the practical cost of sovereign AI.

May 20, 2026[AI]

Why Qwen3.7 is pairing 35-hour agents with custom chips

Alibaba Qwen3.7-Max is not just a model launch. It packages agents, custom chips, 128-accelerator racks, and cloud runtime into one stack.

May 20, 2026[AI]

Genie swallowed Street View, and maps are the world-model bottleneck

Google added Street View grounding to Project Genie. The world-model race is moving from prompts toward real spatial data and responsibility boundaries.

May 20, 2026[AI]

Two AI Scientist Papers in Nature, and the Lab Bottleneck Is Still Human

Nature published Google DeepMind Co-Scientist and FutureHouse Robin together. Research automation is moving from model demos to verified agent loops.

May 20, 2026[AI]

Cohere buys Reliant AI as sovereign AI moves into pharma literature

Cohere’s Reliant AI acquisition shows enterprise AI shifting from general chatbots toward regulated industry agents, evidence tracking, and data sovereignty.

May 20, 2026[AI]