LLM

135 posts

AI AI Agent AI Infrastructure Developer Tools AI Coding Security AI News MCP

11 Seconds of Audio in Under 8 Seconds, Without a GPU

Google and Arm show how on-device generative AI is moving from model releases into CPU runtimes, quantization, memory limits, and silicon features.

May 19, 2026

Google AI Overviews exposes the gap behind citation cards

A May 13 arXiv study measured 55K Google searches and 98K AI Overview claims, showing where citations, ranking, and publisher economics diverge.

May 19, 2026

arXiv one-year bans show the trust cost of AI citations

arXiv scrutiny of AI-generated manuscripts is not a blanket LLM ban. It is a warning about hallucinated citations entering research infrastructure.

May 19, 2026

Mistral 3 675B sets a new baseline for open model competition

Mistral 3 packages a 675B MoE model with 3B, 8B, and 14B edge models under Apache 2.0, shifting open AI competition from benchmarks to deployment.

May 18, 2026

Android is becoming an AI OS, and Gemini’s real gate is the platform

Google Gemini Intelligence tries to turn Android from an app-launching OS into an intelligence system that can read context and act.

May 18, 2026

SANA-WM 2.6B asks what a one-minute world model really costs

NVIDIA SANA-WM claims 720p, 60-second world modeling from a 2.6B backbone. The real story is not video polish but the cost structure of open models.

May 17, 2026

After DAU comes DAA, why Baidu wants an agent metric

Baidu proposed Daily Active Agents as a core AI-era metric. The useful question is not token volume, but how many agents actually complete work.

May 17, 2026

WaveSpeed’s 260-model LLM API moves model choice into the routing layer

WaveSpeed now exposes GPT, Claude, Gemini and 260+ LLMs through one OpenAI-compatible API. Here is what that means for multimodal agents, routing, cost, and trust boundaries.

May 17, 2026

GPT-5.5 crossed 50%, exposing the real bottleneck in enterprise document agents

GPT-5.5 became the first model to pass 50% on Databricks OfficeQA Pro, showing that enterprise agents still fail on parsing, retrieval, permissions, and orchestration.

May 16, 2026

General Compute targets the GPU tax on agent inference

General Compute is making its ASIC-first inference cloud generally available, challenging GPU-centric serving for agent workloads.

May 15, 2026

Thinking Machines Makes AI Collaboration Real Time

Thinking Machines Interaction Models proposes full-duplex collaboration where AI can listen, see, speak, and use tools at the same time.

May 14, 2026

Needle brings tool calling down to a 26M on-device model

Cactus Compute Needle is a 26M-parameter local model for tool calling, a small experiment that changes how agent latency, cost, and privacy should be designed.

May 13, 2026