LLM

135 posts

AI AI Agent AI Infrastructure Developer Tools AI Coding Security AI News MCP

Two H100s Are Enough, Command A+ Targets Private Agents

Cohere Command A+ lowers the bar for private AI agents with Apache 2.0 open weights and a two-H100 deployment target.

May 24, 2026

Fifty Researchers Tested Co-Scientist, and Hypothesis Ranking Changed

Google Co-Scientist and Gemini for Science shift AI research tools from answer generation toward hypothesis loops that humans can test.

May 24, 2026

Nemotron Diffusion tests the one-token-at-a-time bottleneck

NVIDIA released tri-mode diffusion LLMs that switch between AR, diffusion, and self-speculation generation in one checkpoint.

May 24, 2026

3.5 Flash Costs 6x, and Agent Models Have a New Bill

Gemini 3.5 Flash is no longer just a fast chatbot model. It reframes Flash as an agent execution engine and changes how developers calculate cost.

May 24, 2026

A 75% Discount Becomes the Baseline for DeepSeek V4-Pro

DeepSeek is turning V4-Pro API discount pricing into the new baseline, forcing agent builders to recalculate inference cost and routing strategy.

May 23, 2026

Claude Gets a Conscience Tool, and Alignment Moves Into the Agent Loop

Anthropic is widening the moral formation conversation around Claude while testing an ethical reminder tool inside the model runtime loop.

May 23, 2026

$OpenAI’s unit-distance proof puts AI research automation on the record$

OpenAI’s unit-distance proof puts AI research automation on the record

OpenAI’s counterexample to the Erdős unit-distance conjecture shows both the promise of AI research automation and the reproducibility gap left by an unnamed model.

May 23, 2026

SageMaker opens an OpenAI-compatible door for enterprise LLM infrastructure

AWS SageMaker AI now supports OpenAI-compatible inference endpoints, moving enterprise LLM friction from model deployment toward API surfaces, IAM, and routing layers.

May 23, 2026

The 160ms Action Channel Voice Agents Need

The DuplexSLA paper reframes real-time voice-agent latency around a 160ms action channel where speech, planning, and tool calls share one timeline.

May 23, 2026

Gemini 3.5 Flash and the 14x Bill for Fast Agents

Gemini 3.5 Flash pushes speed and agent performance, but Copilot’s 14x request multiplier and early quota complaints expose the new cost bottleneck.

May 22, 2026

Command A+ Turns a 218B Open Model Into a Two-H100 Question

Cohere Command A+ reframes open model competition around agentic inference cost, private deployment, Apache 2.0 licensing, and GPU math.

May 22, 2026

OpenAI is turning YC API tokens into startup equity

OpenAI reportedly offered YC startups $2 million in API tokens through an uncapped SAFE, turning inference compute into a new investment instrument.

May 22, 2026