Gemini API Managed Agents turn model calls into sandboxed workers

Google Gemini API Managed Agents move model calls into isolated Linux sandboxes with stateful agent execution.

AI 요약

What happened: Google introduced Managed Agents for the Gemini API.
- A single API call can start an Antigravity agent that uses tools, runs code, manages files, and browses the web inside an isolated Linux environment.
Why it matters: The model API is expanding from a text generation endpoint into an agent runtime with state and execution infrastructure.
Watch: Public Preview status, default outbound networking, token-heavy interactions, and retention policy all need review before production use.
- Google documentation says a single interaction can often consume 100k-3M tokens.

Among the Gemini announcements from Google I/O 2026 on May 19, the change developers should keep watching is not only the model name. It is the shape of the API. Google introduced Gemini API Managed Agents, a way to start an Antigravity agent from a single API call. That agent runs in a Google-hosted isolated Linux environment and can perform reasoning, tool use, code execution, file management, and web browsing.

At first glance this can look like an add-on to the Gemini 3.5 Flash launch. On the same day, Google announced Gemini 3.5 Flash and led with numbers such as 76.2% on Terminal-Bench 2.1, 1656 Elo on GDPval-AA, 83.6% on MCP Atlas, 84.2% on CharXiv Reasoning, and four times the output tokens per second of frontier models. Those numbers are part of the news. But the deeper shift is that the Gemini API is becoming more than a path for returning model responses. It is also becoming an execution layer that lends developers a remote sandbox and an agent harness.

Google describes Managed Agents as an abstraction over the complex infrastructure, scaffolding, and isolated sandbox management needed to build production-grade agents. The message to developers is that they can focus on agent behavior and product experience. That also implies a meaningful transfer of ownership. Many AI teams have been stitching together model APIs, custom orchestration code, container sandboxes, browser automation, credential proxies, logging, and retry queues. Google is saying that part of that stack can move inside the Gemini API.

The API shape matters more than the model launch

Gemini 3.5 Flash is clearly the engine behind this announcement. Google describes it as a new Flash-family model tuned for agentic workflows and coding. Gemini 3.5 Pro is scheduled for the following month, while 3.5 Flash is available immediately across the Gemini app, Search AI Mode, Antigravity, the Gemini API, AI Studio, Android Studio, and Gemini Enterprise Agent Platform.

Gemini 3.5 Flash benchmark comparison

For API developers, though, the larger question is not just "how fast is this model?" It is "what runtime does this model run inside?" Long-running agents need more state than a simple completion. They need a file system, command execution, recovery after failed steps, and visibility into intermediate tool calls and reasoning steps. If they browse the web, they need network policy. If they call external APIs, they need credential boundaries.

Managed Agents productizes that problem. Google's launch post says a single call can provision a remote Linux environment, then let an Antigravity agent plan, invoke tools, execute code, manage files, and browse the web inside it. Each interaction can create a new environment or continue from an existing one. That means follow-up calls can preserve files and state.

This is where the Gemini API starts to behave differently from a traditional model API. The older mental model was mostly "send input, receive output." Function calling, code execution, and tool use have already complicated that model, but the application usually remained responsible for the execution environment and durable state. Managed Agents moves closer to "assign work to a remote worker." The model is not only answering. It is touching files and tools inside a Google-hosted sandbox.

Interactions API laid the foundation

This announcement did not arrive in isolation. Google introduced the Interactions API in December 2025 as a foundation for handling models and agents through the same interface. The basic premise was that generateContent fits stateless request-response workloads, while agentic applications need a data model for interleaved messages, thoughts, tool calls, and state.

The Google AI documentation updated on May 19, 2026 is more direct. It recommends the Interactions API as the standard for new Gemini projects and describes it as optimized for agentic workflows, server-side state management, and complex multimodal multi-turn conversations. generateContent remains supported, but Google also says new agentic capabilities and tools may come exclusively to the Interactions API first.

That is a product strategy, not just an SDK detail. Instead of stretching a completion API until it resembles an agent runtime, Google is centering the Interaction resource. One interaction represents a complete conversational or task turn and stores a time-ordered record of execution steps. Those steps can include model thoughts, server-side and client-side tool calls, tool results, and final model output. Developers can use convenience fields such as interaction.output_text, but more advanced applications can iterate over the steps timeline to render searches, tool calls, and intermediate reasoning for debugging or user-facing transparency.

Layer	Traditional model API feel	Managed Agents feel
Call unit	Prompt and response	Interaction and execution steps
State management	Client resends history	Server state continues through `previous_interaction_id`
Execution environment	Application operates its own sandbox and workers	Google-hosted isolated Linux sandbox
Observability	Teams design logs and custom traces	Typed steps can be rendered or debugged

The table points to an abstraction shift. A model call is no longer only the unit of text generation. Google is turning an interaction into a task record, connecting agents to environments, and treating state plus execution trace as API resources.

AGENTS.md and SKILL.md become part of the API surface

One striking part of the Managed Agents announcement is the customization model. Google says developers can define instructions and skills in markdown files such as AGENTS.md and SKILL.md, then register those files as a managed agent instead of writing complex orchestration code by hand. This pattern is already familiar in local coding-agent workflows. Project-level instruction files, reusable skills, and task-specific templates change how an agent behaves.

The interesting part is that this file-based convention is moving into an API-managed runtime. The "project instructions" grammar seen in local Codex, Claude Code, Antigravity CLI, and other agent frameworks is becoming a deployable service agent definition. Instead of sending one prompt string, developers can manage agent definitions as versioned files. Those files can become part of code review, change history, and deployment pipelines.

That changes AI application operations. Prompts and tool schemas have often been hidden in application code or scattered across dashboard settings. If files such as AGENTS.md and SKILL.md become official primitives, agent behavior becomes a more explicit artifact. Expected permissions, procedures, output formats, tool preferences, and failure rules can be reviewed at the file level.

File-based configuration does not automatically create safety. Instruction files can also become attack surfaces. If project instructions combine with external tool calls or credential use, prompt injection and supply-chain risk become more concrete. Managed Agents will therefore be judged less by markdown convenience and more by how that customization model binds to sandboxes, network rules, credential boundaries, and audit trails.

The sandbox is a security boundary, not just a convenience

Google AI docs mark Managed Agents as Public Preview and say all agents run in OS-level isolated sandboxes. The same docs also include an important default: environments have unrestricted outbound network access unless developers configure an allowlist for specific domains or wildcard patterns.

That line matters during adoption review. A Google-hosted sandbox can help protect a local machine, but if networking is allowed by default, external API calls, data transfer, package downloads, and web browsing all become part of the agent's possible behavior. Combining browsing with code execution is powerful, but it also expands the radius for mistakes and attacks. Teams should design network allowlists and credential scopes before obsessing over model benchmark deltas.

Credentials need the same discipline. Google documentation recommends using trusted tools and least privilege when connecting external tools and APIs. Credentials can be injected through egress proxy header transformation without being exposed directly inside the sandbox. But the agent can still use credentials it is allowed to access. The central permission question becomes: which credential should this agent receive?

In practice, three principles follow. First, use short-lived tokens instead of long-lived keys. Second, service accounts and API keys should carry only the permissions needed for the specific job. Third, network paths should be narrowed by task-specific allowlists instead of left open by default. Managed Agents can create the sandbox for you. They do not decide your trust boundary for you.

Cost and retention are not visible from the model price card alone

Managed Agents pricing is also harder to reason about than simple input and output token pricing. Google says Public Preview managed agents use pay-as-you-go billing based on Gemini model tokens and tool usage. A single interaction can trigger multiple reasoning loops and, according to the docs, often consume 100k to 3M tokens. Environment compute is not billed during the preview period, but that condition may change at general availability.

That token range is the important part. When an agent reads files, browses the web, executes code, and recovers from failure, token usage can grow quickly. A request like "fix this app" is not one completion. It becomes planning, search, execution, log interpretation, retry, and summary across several steps. Cost forecasting depends less on prompt length and more on iteration count and tool behavior.

Retention policy also affects product design. The Interactions API docs say store=true is the default because Interaction objects are saved for server-side state management, background execution, and observability. Paid Tier retention is 55 days, while Free Tier retention is 1 day. Developers can set store=false, but that setting is not compatible with background=true and prevents continuation with previous_interaction_id.

That creates a clear tradeoff between privacy and convenience. Letting the server hold state can reduce history resends and may help performance and cost through implicit caching. Sensitive workflows, however, require review of retention windows, deletion APIs, compliance boundaries, and data processing terms. Teams that send customer data, proprietary code, or financial documents to an agent cannot treat retention as a footnote.

100k-3M

Typical token range for one interaction

55 days

Paid Tier interaction retention

7 days

Inactive environment deletion threshold

These numbers are why Managed Agents should be viewed as operational infrastructure, not only as a convenient agent API. Calls can become expensive. State can be stored. Environment lifecycle can affect product behavior.

Antigravity moves from IDE surface to API primitive

On the same day, Google also used its I/O developer highlights to announce the Antigravity 2.0 desktop app, Antigravity CLI, Antigravity SDK, and Gemini Enterprise Agent Platform integration. When combined with the recent move from Gemini CLI for individual users to Antigravity CLI, Google's direction is consistent. Antigravity is being positioned less as a single product and more as the center of an agent harness.

Managed Agents fit into that strategy. Google says they are powered by the Antigravity agent harness and optimized for Gemini 3.5 Flash. The developer surfaces differ. The desktop app coordinates multiple agents. The CLI creates agents from the terminal. The SDK provides programmatic access to the same harness. Gemini API Managed Agents expose that harness as a cloud sandbox and API primitive.

The strength of this approach is coherence. If the model, agent harness, API, AI Studio, Android, Firebase, Search, and Gemini Enterprise all connect to the same workflow, agentic development can scale quickly inside the Google ecosystem. It becomes plausible to prototype in AI Studio, bring the work into Antigravity, turn it into a Managed Agent through the API, and attach native Workspace API calls.

The weakness is also clear. Developers step deeper into Google's runtime abstraction. Agent behavior, state, sandboxing, token usage, network rules, credential injection, and retention all become sensitive to platform policy and documentation changes. The Interactions API is itself in Beta, and the docs warn that features and schemas may face breaking changes. For production workloads where stability is the main constraint, generateContent may still be the safer default.

Search points in the same direction

Google's Search announcements also connect Gemini 3.5 Flash and Antigravity. The company is upgrading AI Mode's default model to Gemini 3.5 Flash and previewing features that generate custom UI, visual tools, simulations, dashboards, and trackers inside Search. Some generative UI features are planned for free Search users this summer, while custom experiences start first with Google AI Pro and Ultra users in the United States.

That may sound like a consumer product story, but it gives developers a useful signal. Google's agentic future is not "a chatbot answers." It is a system that composes mini apps, dashboards, trackers, and simulations around a user's question, then keeps them updated with fresh sources and tools. A model alone cannot do that. The platform needs execution environments, state, data access, UI generation, and a tool trace that can be inspected.

Managed Agents are the developer-side version of that idea. If Search tells users that an agent can build a mini app around their question, the Gemini API tells developers to start and continue a sandboxed agent through a single API surface. Both announcements share the same platform imagination. The model becomes a worker, and the API becomes an execution coordinator rather than a string-returning endpoint.

Community skepticism sits between benchmarks and product experience

Google's announcement contains many benchmarks and partner references, including Shopify, Macquarie Bank, Salesforce, Ramp, Xero, and Databricks. Agent products, however, are hard to judge from benchmarks alone. In related Hacker News discussion around Gemini, some developers acknowledged strong benchmark results while also saying the lived Antigravity experience did not match Claude Code or Codex for their daily work. The wording can be blunt, but the underlying point is useful: model performance and day-to-day agent experience are separate questions.

That is the point to watch with Managed Agents. For Google's remote sandbox and agent harness to become a good developer experience, the API needs to be stable, logs and steps need to be understandable, and recovery from failed work needs to be predictable. Network allowlists, credential injection, cold starts, token cost, and retention controls must show up as product ergonomics, not just documentation sections.

The phrase "single API call" is therefore double-edged. It lowers the barrier to starting. But if that call consumes 100k-3M tokens, runs through multiple tool loops, browses the external web, and calls APIs with attached credentials, operators need to know what happened inside it. Agent convenience has to travel with traceability.

What development teams should check now

First, evaluate the Interactions API for new Gemini projects. Google recommends it as the standard for new work, but it also labels the API as Beta and warns about breaking changes. If production stability dominates, staying with generateContent may be right. If agentic workflow and server-side state are central, the Interactions API is the path to test.

Second, do not treat Managed Agents as merely "an LLM that can run code." This is a runtime that includes sandboxing, networking, credentials, state, lifecycle, and billing. Teams need to design allowlist defaults, egress policy, service account scopes, token budgets, and deletion policy.

Third, prepare to manage agent definitions as files. AGENTS.md and SKILL.md can become deployable behavior definitions rather than informal prompt notes. That means code review, linting, secret scanning, prompt-injection review, and versioning become part of agent operations. AI teams, security teams, and platform teams will look at the same files from different angles.

Fourth, add cost observability from the start. In a structure where one interaction can expand through several loops, tail cost matters more than average token use. Failed work, infinite retries, large repository scans, web browsing loops, and repeated generated UI revisions can all move the bill. Teams should budget by task type, not only by model price sheet.

Conclusion: Gemini API is starting to host workers

Gemini API Managed Agents are more than "Google launched an agent API." The more precise story is that the boundary of the model API is moving. Developers can now call not only a model that generates text, but a remote worker that creates files, runs code, and browses the web inside an isolated Linux environment. That worker uses Gemini 3.5 Flash and the Antigravity harness, and it operates on top of Interactions API state and execution steps.

The convenience is real. Teams do not have to build all sandbox infrastructure themselves or design an agent harness from scratch. Defining custom agents through AGENTS.md and SKILL.md also fits naturally with developer workflows. But the operational questions grow with the convenience. How far should networking be allowed? Which credentials should the agent receive? How long should state be retained? What is the token budget for one interaction? How will teams absorb breaking changes in a preview API?

Gemini 3.5 Flash benchmarks are useful context for the announcement. The core news, though, is the cloud-primitive form of agent runtime. Google is widening Antigravity across app, CLI, SDK, Enterprise, and API surfaces in an attempt to change the developer experience from "call a model" to "run a managed worker." The next competitive axis is no longer model quality alone. It is who can offer the safest, most observable, and most predictable execution layer for AI agents.