One API call, Google opens the serverless agent runtime

Gemini API Managed Agents hides sandboxing, state, and tool loops behind an API, moving agent competition into runtime infrastructure.

AI 요약

What happened: Google opened Managed Agents in the Gemini API and introduced the Antigravity agent in preview.
- A single Interactions API call can run reasoning, code execution, file handling, and web browsing inside a Google-hosted Linux sandbox.
Why it matters: The agent race is moving beyond model calls toward runtimes with sandbox, state, tool loops, and network controls.
The numbers: Google docs say one interaction often consumes 100k-3M tokens, while complex workflows can reach 3M-5M tokens.
Watch: This is a preview feature, so teams should inspect outbound networking, data retention, token cost, and unsupported tools before production use.

Google introduced Managed Agents for the Gemini API at I/O 2026. On the surface, it is another agent feature arriving inside an existing model API. For builders, the more important shift is that Google is starting to sell agent execution as a hosted runtime: sandbox, state, filesystem, web access, tool loops, and cost visibility bundled behind an API surface.

Google's pitch is simple. A developer calls interactions.create once. The Antigravity agent then reasons, uses tools, runs code, manages files, and browses the web inside an isolated Linux environment hosted by Google. The agent is based on Gemini 3.5 Flash and uses the same harness as the Google Antigravity IDE. In other words, pieces of the agent execution stack that teams have been assembling from local terminals, Docker containers, browser automation, file synchronization, and tool orchestration are moving behind a managed API.

That is why this story is not just "Google launched another coding agent." devlery has already covered nearby I/O announcements such as Gemini Spark, the Antigravity CLI transition, Android Agent Skills, and the AWS MCP Server. This release sits at a different boundary. Managed Agents is about the API product layer. Google wants AI app developers to attach remote sandbox workers to their products without building agent infrastructure from scratch. Just as serverless hid server operations, Google is trying to hide agent runtime operations.

Google I/O 2026 developer tools announcement image

Why this looks like serverless for agents

The word "serverless" is an analogy, but it helps explain the product move. AWS Lambda abstracted away server provisioning, process lifetime, and scaling for event-driven code. Managed Agents abstracts the execution environment, state reuse, tool execution, and file handling needed for agent work. Developers no longer have to start from container orchestration, dependency installation, generated file persistence, and monitoring code for long-running tasks.

Google's official blog says production-grade agents previously required complex infrastructure to build and manage isolated sandboxes. It presents Gemini Managed Agents as a way to abstract that complexity so teams can focus on product experience and agent behavior. That may sound like launch copy, but it describes a real operational problem. Calling a model API once is easy. Letting an agent execute code, create intermediate files, fetch data from the web, recover from multi-step failures, and continue in the same workspace on the next request quickly becomes an infrastructure project.

That is the problem Managed Agents is targeting. Google provides the default Antigravity agent as a general-purpose managed agent, and developers can shape behavior with markdown files such as AGENTS.md and SKILL.md or with inline configuration. Anthropic Claude Code, OpenAI's agent tooling, and GitHub Copilot already use repository instructions and skill-like concepts in their products. Google is now putting a similar customization surface directly into the API.

The Interactions API changes the unit of work

Managed Agents runs on the Gemini API's Interactions API. That API is a signal in its own right. Google describes Interactions as a new standard optimized for agentic workflows, server-side state management, and complex multimodal multi-turn conversations. The older generateContent API remains supported, but Google also says some new models and agentic capabilities will land on the Interactions side.

The unit of invocation changes. In a typical chat completion or content generation flow, the mental model is still "one input, one output." An Interaction is a record of a conversation turn or task. It can include model thoughts, tool calls, tool results, and final model output as ordered execution steps. Developers can read only the final text, but richer products and debugging tools can render or trace the intermediate steps.

That structure matters in agent products. When a user asks, "Analyze this CSV and create a report," the real execution may involve web search, file creation, Python runs, validation, retries, and summarization. If the UI is just a spinner, the user cannot tell where the agent is stuck or why it failed. The Interactions API step model is what lets a product show what the agent is doing and measure where time and cost are accumulating.

App server calls interactions.create

down

Antigravity agent: Gemini 3.5 Flash plus agent harness

down

Google-hosted Linux sandbox: code, files, web, environment state

down

Execution steps, files, final output, follow-up interaction

Cost intuition changes

Cost is the first thing teams should watch. Google docs emphasize that the Antigravity agent is not structured like a standard chat request that answers once and stops. A single request can run an autonomous loop with reasoning, tool execution, code execution, and file management. The surface area looks like "one API call," but internally many tokens and tool calls can accumulate.

The official examples are specific. For research and information synthesis, the docs show 100k-500k input tokens, 10k-40k output tokens, and a typical cost range of $0.30-$1.00. For document and content generation, they show $0.30-$1.30. For data processing and analysis, they show 300k-3M input tokens, 30k-150k output tokens, and $0.70-$3.25. Complex agentic workflows can accumulate 3M-5M tokens and cost up to roughly $5.

Official example task	Input tokens	Output tokens	Typical cost
Research synthesis	100k-500k	10k-40k	$0.30-$1.00
Document generation	100k-500k	15k-50k	$0.30-$1.30
Data analysis	300k-3M	30k-150k	$0.70-$3.25
Complex workflow	3M-5M possible cumulative use	Varies by task	Up to about $5

The exact numbers matter less than the change in cost model. In a traditional model API product, developers directly control prompt length and output length. In an agent runtime, cost depends on how often the agent searches, how often it runs code, how long it validates failed results, and how many times it rereads files. A user's request can be short while the agent's internal work becomes long.

Any team adding Managed Agents to a product should measure p95 token cost early. Before selling the feature as if it had a fixed price, the product needs streaming step visibility and a way to cancel long-running runs. Google's note that environment compute is not billed during preview also needs careful reading. CPU, memory, and sandbox execution may be free during preview, but underlying Gemini model tokens and tool usage still belong to the billing model.

The default network is open

The second key issue is the security boundary. Google's Agents overview says managed agents run in OS-level isolated sandboxes. It also says they have unrestricted outbound network access by default. Developers can use an allowlist to restrict outbound traffic to specific domains or wildcard patterns. That small detail matters. A hosted sandbox does not automatically mean a narrow network boundary.

Agents are useful because they can read the web and call external APIs. In production workflows, that usefulness becomes a risk surface. If an agent analyzes internal data, writes a report, and consults external documents, the product team needs to know which domains the agent can reach. Once credentials enter the workflow, the agent can use the full scope of those credentials. Google docs also recommend trusted external tools, least-privilege service accounts, and short-lived tokens when attaching external APIs.

This connects directly to the recent MCP and agent security debate. Agents interpret human-readable documents, web pages, repository instructions, and tool schemas together. If external input can steer tool calls, network and credential boundaries become core product design. Managed Agents may reduce infrastructure work, but it does not make it safe to attach broad user API keys, leave unrestricted networking enabled, and hand over production data without review.

Boundary to check	Signal in the docs	Practical question
Network	Outbound networking is unrestricted by default	Have you narrowed access to approved domains with an allowlist?
Credentials	Docs describe ways to avoid exposing credentials directly inside the sandbox	Are tokens short-lived and scoped to least privilege?
State	Paid-tier interactions are retained for 55 days; free-tier interactions for 1 day	Do `store=false` and deletion policies match your product requirements?
Human review	Docs recommend output and action validation for sensitive workflows	Is there approval UX before deployment, payment, or data mutation?

State management is a product decision

The Interactions API uses server-side state by default. If a developer passes previous_interaction_id, they can continue without resending the whole conversation history. That can also help performance and cost through implicit caching. Agent environments can preserve files and state when an environment ID is reused. From a developer experience perspective, this is convenient: Google manages the state needed for long-running and multi-turn work.

But state is also product policy. Google docs say Interaction objects are stored by default: 55 days on the paid tier and 1 day on the free tier. That storage makes server-side state management, background execution, and observability easier. Developers can set store=false, but doing so limits features such as background=true and later use of previous_interaction_id.

Teams should not hide that tradeoff from themselves or their customers. If an agent product processes customer data, the question becomes: what interaction state is stored by the provider, for how long, and why? For internal tooling, 55-day retention may be fine. For finance, healthcare, legal, or security workflows, the answer may differ. Just as teams design data retention when choosing a serverless database, they now need a retention architecture for agent runtimes.

This is not a production-stable endpoint yet

Google is explicit about limitations. The Antigravity agent and Interactions API are preview or beta features. Schemas and features can change. Structured output is not supported. Some generation settings, including temperature, top_p, top_k, stop_sequences, and max_output_tokens, can return 400 errors. file_search, computer_use, google_maps, function_calling, and mcp are not supported yet. Audio, video, and document input are also not supported; the current input types are text and image.

Those limitations are not just disappointments. They locate the product on the maturity curve. Google has exposed the skeleton of an agent runtime, but not every Gemini tool immediately works inside the Antigravity agent. MCP support is especially notable because the same I/O cycle included repeated references to Antigravity, Agent Skills, Spark, and MCP connections. It is easy to assume the Gemini API managed agent can immediately connect to MCP servers. According to the official docs, it cannot yet.

The right posture is not "move all production agents immediately." A better first step is to test internal research pipelines, batch automation, prototypes, and low-risk data workflows. Measure actual cost and failure modes. The feature is still in preview, and the API may break. That makes it too early to put core customer workflows on top of Managed Agents while promising strong stability.

Google's real competition is not only agent frameworks

Managed Agents competes with agent frameworks such as LangGraph, CrewAI, and LlamaIndex, but the bigger competitive map is broader. OpenAI is moving developer tasks into controlled environments through the Agents SDK and Codex cloud workflows. Anthropic is reinforcing the connective tissue between agents, APIs, and CLIs through Claude Code, the MCP ecosystem, and the Stainless acquisition. GitHub is turning GitHub workflow itself into something like an agent runtime through Copilot apps, cloud agents, and review feedback handoff. Vercel is also widening the execution surface for agents with Sandbox, AI Gateway, and deployment workflows.

Google's advantage is that Gemini API, AI Studio, Antigravity, Android, Workspace, and Google Cloud all sit inside one company. If Managed Agents starts in the API, then flows into AI Studio's visual playground, the Antigravity desktop experience, and Gemini Enterprise Agent Platform, Google can offer a full path to build, test, deploy, and connect agents to enterprise systems. That strategy is stickier than a single model benchmark.

Google's weakness also comes from that breadth. The more product surfaces exist, the more terminology and boundaries developers must learn. Antigravity IDE, Antigravity agent, Antigravity SDK, Managed Agents, Interactions API, and Gemini Enterprise Agent Platform need crisp responsibility lines. Developers should quickly understand whether they need an API, an IDE, an SDK, or a Cloud product. If Google fails to clarify those boundaries, adoption can lag even if the runtime is useful.

Developers are watching control, not just convenience

Managed Agents did not trigger a massive standalone debate on Hacker News or GeekNews. The surrounding reaction still shows the direction of concern. GeekNews covered Google's plan to discontinue Gemini CLI for free, Pro, and Ultra users on June 18, 2026 and move them toward Antigravity CLI. Much of that discussion focused on whether Google is pushing existing CLI users into a new agent platform. Managed Agents belongs to the same broader repositioning: Google is moving Gemini from a model API toward an Antigravity-centered agent platform.

Early developer posts and commentary tend to read Managed Agents as a serverless agent runtime. The positive read is friction reduction: one API call gives teams sandboxing, web access, code execution, and file management. The skeptical read focuses on token cost, preview limitations, network defaults, and data retention. Those reactions are not contradictory. Not managing infrastructure is a real benefit, but it also means provider defaults deserve closer inspection.

For teams building AI agent products, the response is a useful checklist. The demo question is no longer just "can the agent do it?" The harder questions are where it runs, what state remains, how far the network is open, whether users can predict cost, and how a failing tool loop can be stopped. Managed Agents puts all those questions in one product surface.

What teams can try now

First, measure real token cost on low-risk workflows. Good examples include summarizing public web data, analyzing a sample CSV, or modifying a synthetic repository instead of internal code. Record how a single prompt decomposes into steps, how many tools are called, and how many total tokens are consumed. Without that measurement, user quotas and pricing can drift quickly.

Second, make network allowlists and credential scope part of the default design. A Google-hosted sandbox is convenient, but the docs say outbound networking is open by default. Once a team connects internal APIs, SaaS APIs, or database proxies, the agent's reachable surface becomes the product team's responsibility.

Third, turn state retention into a user-facing policy. previous_interaction_id and reusable environments are useful. If customer data enters the workflow, however, the product needs to connect provider-stored interaction state to terms, admin settings, deletion flows, and compliance review. Legal, security, and platform teams may matter more than model performance here.

Fourth, decide how Managed Agents fits alongside existing frameworks. Teams already using LangGraph or CrewAI for orchestration do not have to replace everything. A practical architecture may delegate specific tasks to Google's hosted Antigravity agent while keeping business logic in the existing backend. For teams without the capacity to build agent infrastructure, an API-hosted sandbox worker may be a fast starting point.

The next fight after model APIs is runtime

Gemini API Managed Agents is still a preview. It has real limits: no MCP support yet, no structured output, and enough instability risk that core production workflows need caution. Still, the announcement matters because Google is defining an AI agent not as "a model that calls tools," but as an API product with a hosted execution environment and persistent state.

AI app competition may soon be harder to explain with model names alone. The more important questions will be which sandbox runs the work, how long state is retained, how tool loops are observed, how network and credentials are constrained, and what unit makes cost spike. Managed Agents is Google's version of that shift. One API call is convenient. But if that one call can include millions of tokens, external networking, retained state, and automated file work, the thing builders need to design is not just the prompt. It is the operating boundary.