AI
11 Seconds of Audio in Under 8 Seconds, Without a GPU
Google and Arm show how on-device generative AI is moving from model releases into CPU runtimes, quantization, memory limits, and silicon features.
AI
Google and Arm show how on-device generative AI is moving from model releases into CPU runtimes, quantization, memory limits, and silicon features.
AI
A May 13 arXiv study measured 55K Google searches and 98K AI Overview claims, showing where citations, ranking, and publisher economics diverge.
AI
arXiv scrutiny of AI-generated manuscripts is not a blanket LLM ban. It is a warning about hallucinated citations entering research infrastructure.
AI
Mistral 3 packages a 675B MoE model with 3B, 8B, and 14B edge models under Apache 2.0, shifting open AI competition from benchmarks to deployment.
AI
Google Gemini Intelligence tries to turn Android from an app-launching OS into an intelligence system that can read context and act.
AI
NVIDIA SANA-WM claims 720p, 60-second world modeling from a 2.6B backbone. The real story is not video polish but the cost structure of open models.
AI
Baidu proposed Daily Active Agents as a core AI-era metric. The useful question is not token volume, but how many agents actually complete work.
AI
WaveSpeed now exposes GPT, Claude, Gemini and 260+ LLMs through one OpenAI-compatible API. Here is what that means for multimodal agents, routing, cost, and trust boundaries.
AI
GPT-5.5 became the first model to pass 50% on Databricks OfficeQA Pro, showing that enterprise agents still fail on parsing, retrieval, permissions, and orchestration.
AI
General Compute is making its ASIC-first inference cloud generally available, challenging GPU-centric serving for agent workloads.
AI
Thinking Machines Interaction Models proposes full-duplex collaboration where AI can listen, see, speak, and use tools at the same time.
AI
Cactus Compute Needle is a 26M-parameter local model for tool calling, a small experiment that changes how agent latency, cost, and privacy should be designed.