AI
Fifty Researchers Tested Co-Scientist, and Hypothesis Ranking Changed
Google Co-Scientist and Gemini for Science shift AI research tools from answer generation toward hypothesis loops that humans can test.
AI
Google Co-Scientist and Gemini for Science shift AI research tools from answer generation toward hypothesis loops that humans can test.
AI
NVIDIA released tri-mode diffusion LLMs that switch between AR, diffusion, and self-speculation generation in one checkpoint.
AI
Anthropic Project Glasswing shows that AI vulnerability discovery is no longer the slowest step. Verification, disclosure, and patch rollout are now the constraint.
AI
Gemini 3.5 Flash is no longer just a fast chatbot model. It reframes Flash as an agent execution engine and changes how developers calculate cost.
AI
SpecBench measures the reward hacking gap in long-horizon coding agents, where visible tests pass while real compositional use still fails.
AI
Mistral’s Emmi AI acquisition points beyond chatbots toward industrial agents for simulation, digital twins, and engineering workflows.
AI
Vercel Labs Zero treats AI agents as first-class users by redesigning compiler diagnostics, repair plans, capabilities, and tool contracts.
AI
Chrome’s WebMCP proposal lets web pages expose structured tools to browser agents instead of forcing them to infer and click UI controls.
AI
OpenAI and Google are turning C2PA, SynthID, and verification tools from image-generator features into web-scale trust infrastructure.
AI
DeepSeek is turning V4-Pro API discount pricing into the new baseline, forcing agent builders to recalculate inference cost and routing strategy.
AI
Open Agent Leaderboard evaluates full agent systems, not just standalone models, combining architecture, tools, cost, and failure behavior.
AI
agentmemory points at a new layer for coding agents: shared local memory across Claude Code, Codex, Cursor, OpenCode, and other tools.