Devlery
Blog/AI

The 24-Hour Agent Permission Problem in Front of 900M Gemini Users

Google Gemini Spark brings background agents, MCP connections, and approval boundaries into a mass-market consumer AI surface.

The 24-Hour Agent Permission Problem in Front of 900M Gemini Users
AI 요약
  • What happened: Google introduced Gemini Spark at I/O 2026 as a 24/7 personal AI agent inside the Gemini app.
    • Spark runs on Gemini 3.5 and the Antigravity harness, with background work across Gmail, Docs, Slides, and other Workspace surfaces.
  • The scale: Google says the Gemini app reaches more than 900M monthly users, while the Spark beta starts with U.S. Google AI Ultra subscribers.
  • Why it matters: MCP connectors, custom sub-agents, and a local browser roadmap point to consumer apps becoming long-running agent runtimes.
    • The hard part is not just productivity. It is permission design, approval UX, auditability, and responsibility when agents act across personal and third-party services.

Google's Gemini Spark, announced at I/O 2026, looks at first like another assistant feature in a crowded AI product lineup. It is more consequential than that. Spark is Google's attempt to put a 24/7 personal AI agent inside the Gemini app, with the ability to continue working in the cloud even after a user closes a laptop or locks a phone. Google described it as an agent that can navigate a user's digital life and act under the user's direction.

The reason this matters is distribution. In the same announcement cycle, Google said the Gemini app has more than 900 million monthly users across 230 countries and more than 70 languages. Spark itself is not being opened to that whole surface immediately. Google is starting with trusted testers and a U.S. Google AI Ultra beta. Still, the deployment surface already exists. If consumer agents enter the mainstream not through separate agent apps but through Gmail, Docs, Slides, Calendar, Chrome, and desktop integrations people already use, Spark is one of the clearest signals of that shift.

For builders, the story is not simply that Google made a personal assistant. Spark is built on Gemini 3.5 and Google's Antigravity harness. It is tied to Workspace tools, MCP connections to services such as Canva, OpenTable, and Instacart, and a roadmap that includes custom sub-agents, messaging or emailing Spark, and local browser control. In other words, Spark is not a single chat feature. It is a consumer agent runtime made from a model, a harness, app connectors, approval UI, and eventually browser automation.

Spark changes the question from answers to permissions

Google's examples are intentionally ordinary. Spark can parse monthly credit card statements to find new or hidden subscriptions. It can read school emails, pull important deadlines, and send a daily digest to a family. It can gather scattered meeting notes from email and chat, create a Google Docs document, and draft the accompanying email. That sounds like the familiar productivity pitch: remove tedious coordination work.

Look one layer deeper, and the core issue is permission. To inspect credit card statements, the agent needs access to financial emails or attachments. To summarize school notices, it touches family information and calendars. To turn meeting notes into a document, it may cross Gmail, chat, Drive, and Docs. To draft an email, it approaches the boundary between private generation and outward action. Google's phrase "under your direction" matters because useful agents need wider read and write scopes, and wider scopes create product and safety obligations.

Google says Spark is designed so users choose whether to turn it on, choose which apps to connect, and are asked before high-risk actions such as spending money or sending email. That is the right direction. The real news question is how finely that promise works in daily use. Asking "may I send this email?" is not the same as showing the recipient, the evidence used, the exact body, the attachments, the external services contacted, and what data leaves the Google account. For an agent operating across personal apps, approval is not a checkbox. It is the product surface where trust is either earned or lost.

BoundaryTraditional chatbotGemini Spark
RuntimeCentered on the moment a user opens a window and asksContinues background work in the cloud
Task scopeAnswers, summaries, and draftsWorkspace, MCP apps, and future browser control
Approval pointMostly user judgment before copy and pasteProduct approval UX before high-risk actions
Builder concernPrompts, RAG, and single tool callsMCP permissions, sub-agents, and long-running telemetry

Gemini 3.5 and Antigravity behind the consumer app

Spark runs on Gemini 3.5 and the Antigravity harness. Google's I/O roundup positioned Gemini 3.5 Flash as stronger than Gemini 3.1 Pro on coding and agentic benchmarks, citing Terminal-Bench 2.1, GDPval-AA, and MCP Atlas scores. Pairing Spark with those systems is not incidental. A 24-hour agent must do more than answer a single prompt. It needs to plan across steps, call tools, recover from failures, maintain state, and know when to stop.

Antigravity has already been part of Google's agentic development platform story. If Spark brings that harness into the consumer Gemini app, the line between developer agents and personal productivity agents gets blurrier. A coding agent and a personal assistant now face many of the same engineering questions: which tool should the model call, where should long-running state live, how should intermediate work be checked, and when should the user be asked for approval?

That creates pressure for AI app developers. If Spark brings MCP connections into a mass consumer surface, external services need to think about APIs for people and APIs for agents at the same time. Canva, OpenTable, and Instacart are not just integration logos. They become examples of services that may expose editing, reservation, ordering, and pre-payment workflows to an agent. If more services ship MCP servers, product teams will need to design for the difference between a human clicking through a UI and a background agent periodically calling tools.

Google's official Gemini Spark partner image

What happens when MCP enters consumer apps

Until now, MCP has mostly been discussed in developer-tool contexts. Claude Desktop, Claude Code, Cursor, OpenAI-style agents, and local tool servers made MCP feel like infrastructure for builders. Spark changes the framing by presenting MCP as a consumer-app expansion path. Google says Canva, OpenTable, and Instacart MCP connections are launching, with Spark's ability to use those connections for work coming in the following weeks.

That move is subtle but important. To consumers, it may look like Gemini can book a restaurant, shop for groceries, or edit design assets. To builders, it raises questions about permission requests, data scope, execution confirmation, and rollback paths. Searching for OpenTable candidates is low risk. Confirming a reservation at a specific time is higher risk. Booking something with a no-show fee or using payment details is high risk. The same split exists in grocery workflows. Building a cart and placing an order are not the same action.

So Spark adds new requirements to the MCP ecosystem. It is no longer enough for a tool schema to say, in natural language, what arguments a function accepts. Agent runtimes need to understand which calls are read-only, which mutate external state, which trigger cost, which share user data with third parties, and which can be reversed. Many current agent tool descriptions rely heavily on prose. That may work inside a narrow developer setup. It is less convincing when the agent surface sits in front of hundreds of millions of users and runs in the background.

Google's strength is distribution, and so is its risk

Spark's biggest advantage is Google's product footprint. Gmail, Docs, Slides, Calendar, Drive, Android, Chrome, and the Gemini app for macOS are already places where people live and work. A user may not need to install a separate agent tool or learn a new operating model. They can turn on agentic behavior inside the surfaces that already hold their messages, documents, meetings, and files. OpenAI's ChatGPT Agent and Anthropic's Claude Cowork are important competitors, but Google starts from the junction between account data and everyday productivity tools.

That same strength creates a sharper trust problem. When an independent agent app makes a mistake, a user can mentally isolate the failure to that app. When an agent deeply integrated with Gmail, Calendar, and Docs makes a mistake, it can feel like a Google account-level trust failure. A misdirected email, a sensitive document reference, a bad reservation, an unnecessary purchase, or exposed personal schedule is not just a model error. It becomes a product responsibility issue.

The limited launch makes sense in that light. Spark is starting with trusted testers and U.S. Google AI Ultra subscribers. This is not only a feature maturity question. It is also a controlled experiment in trust. AI Ultra is a high-end tier rather than a broad free feature, so it would be misleading to say 900 million Gemini users suddenly have a 24-hour agent. The more precise claim is that Google is testing a sensitive agent capability inside an app that already has a massive user surface.

Daily Brief is easier than Spark

Google also announced Daily Brief in the same broader product wave. Daily Brief is an agent that reads signals from Gmail, Calendar, and Tasks to summarize the day and suggest next steps. It is rolling out first to U.S. Google AI Plus, Pro, and Ultra subscribers who opt into Google app connections. This is easier for users to understand than Spark. It reads, summarizes, prioritizes, and suggests. Until the user acts, it usually does not change the outside world.

Spark is different. Google's own framing places it in the move from information to action. That distinction matters for product design. A reading agent can summarize badly, and the user can often correct for it before acting. A writing or acting agent can leave traces in the world. Emails get sent, reservations get created, documents get shared, purchases happen. For Spark to succeed, the approval and recovery experience may matter more than a model benchmark.

Teams building their own AI products should apply this distinction directly. "The AI handles it for you" sounds attractive, but the permission model has to break work into levels: read-only access, draft creation, internal state changes, external state changes, and irreversible or costly actions such as sending, sharing, booking, or paying. Each level needs different logs, explanations, cancellation controls, approval rules, and retry behavior. Spark may become an important reference point for how mainstream users expect those boundaries to work.

Expectations and skepticism

Early community reaction appears split. One side sees Spark as a meaningful step toward an agentic web brought into mainstream products. That optimism is understandable. Google's product graph is unusually broad, and Spark could act on top of data and services users already connected years ago.

The other side worries about accountability. A Reddit discussion around Google I/O argued that the event showed an agentic web without showing clearly who is responsible when agents act. That is not just vague anxiety. If an agent reads Gmail, adjusts Calendar, calls external MCP services, and eventually controls a local browser, failures become hard to attribute. Was the problem model judgment, a vague tool schema, an MCP server's permission design, an overly broad user grant, or an approval UI that hid the risk?

For developers, that skepticism is useful. A demo that proves the model can do something is not enough. Agent products need to answer what users can see after a failure, what logs remain, how permissions can be revoked, how background work is surfaced, and what can be rolled back. Spark's mainstream arrival will likely raise user expectations for these controls across the whole agent ecosystem.

What to watch next

The first thing to watch is Spark's approval UX. Google says Spark asks before high-risk actions. The open question is what "high-risk" means in the beta. Sending email, sharing documents, booking appointments, spending money, passing data to external apps, and manipulating a browser all carry different kinds of risk. The details of classification, preview, diff, and confirmation will matter.

The second is MCP permission structure. Canva, OpenTable, and Instacart are only the start. If more partners connect, MCP servers will need structured ways to express read and write scope, cost, personal-data transfer, and reversibility. If that becomes common practice in consumer integrations, the same discipline may flow back into developer agent tooling.

The third is custom sub-agents. Google says Spark will let users create custom sub-agents. That is a powerful automation layer, but it is also a permission-sprawl layer. Builders should watch whether sub-agents inherit permissions, whether they keep separate logs and memories, and whether users can understand which sub-agent touched which app.

The fourth is local browser and macOS integration. If Spark connects to the macOS app and eventually controls a local browser, the boundary between cloud agents and local computer use opens again. Browsers contain logged-in sessions, internal tools, admin consoles, checkout flows, and sensitive documents. Browser automation can make agents far more useful, but without isolation and clear approval it can also become the riskiest permission in the stack.

Spark is news about an operating model

Reading Gemini Spark as "Google's new AI assistant" misses the more interesting part. Google is trying to turn the Gemini app into an agent runtime that can run background work, call MCP-connected services, support custom sub-agents, and eventually operate a local browser. The current launch is limited, and real quality and safety still need to be proven. But the direction is clear. AI product competition is moving beyond the answer box into app permissions, connected services, approval boundaries, and long-running logs.

The practical takeaway for engineering teams is to watch the permissions more closely than the model score. What can the agent read? What can it execute? When does it ask? What proof does it show before acting? What can be undone after a mistake? Teams building MCP servers or agent connectors should move beyond prose-only tool descriptions and begin representing risk level, data scope, and side effects in a way machines can inspect. Whether Spark succeeds or fails, the fact that a 24-hour agent is moving into a mass consumer app is likely to change the baseline for AI product design.