Gemini API Webhooks turns AI work into an agent runtime

Google Gemini API Webhooks moves long-running AI jobs from polling loops into event-driven backend operations.

AI 요약

What happened: Google introduced Gemini API Webhooks, so long-running Batch, Interactions, and video generation jobs can send completion events through HTTP POST.
- The May 4, 2026 announcement frames webhooks as a way to replace polling, reduce latency, and lower operational overhead for asynchronous AI work.
Why it matters: AI APIs are moving beyond request-response model calls toward agent runtime pieces: queues, callbacks, idempotency, replay protection, and result pointers.
Watch: Google provides at-least-once delivery and retries for up to 24 hours, but duplicate handling, signature checks, and billing guardrails still belong to developers.

The official Gemini API Webhooks announcement image. Google emphasizes an event-driven HTTP POST path for long-running Gemini API jobs.

Google added Event-Driven Webhooks to the Gemini API on May 4, 2026. At first glance, this looks like a familiar developer feature. A job finishes, and a server calls your server. Payments, email platforms, CI systems, deployment tools, and collaboration software have used that pattern for years. The interesting part is not the webhook itself. It is that Google is attaching this pattern to long-running AI work inside the Gemini API.

The official announcement points to Deep Research, long video generation, and high-volume Batch API workloads that process thousands of prompts. These jobs are different from ordinary LLM calls that return in a few seconds. They can take minutes or hours. They can fail, expire, be cancelled, or require user action midway through a workflow. Until now, developers generally had to poll GET /operations to learn whether that work had finished. Google now says the Gemini API can push an HTTP POST payload to a developer endpoint when the state changes.

That sounds like a small convenience feature. In practice, it changes the operating model for AI applications. The more an AI agent performs long-running work, the less an application looks like "send a prompt, receive an answer." It starts to look like a system that registers jobs, waits for external events, follows result pointers, handles failures, and resumes the next step of a workflow. Gemini API Webhooks puts that shift on the official API surface.

This is a job model shift, not just the end of polling

The older pattern is simple. A backend creates a Batch job and checks the status endpoint every few seconds or minutes. When the job completes, the backend fetches results. If it fails, the backend records the error. At small scale, this is often enough. Once AI workloads move toward long-running jobs, polling becomes operational drag.

First, it creates unnecessary requests. If a job takes 40 minutes and the system checks every 10 seconds, that is 240 status calls for one job. Run thousands of Batch jobs at the same time and the status traffic can become a noisy source of load that is separate from the AI work itself.

Second, latency becomes tied to the polling interval. A system that checks once a minute can wait almost another minute after the job is already done. Check every five seconds and latency improves, but request volume rises. If completion triggers a user notification, a follow-up agent step, a database sync, or another queue job, that delay propagates through the workflow.

Third, state management spreads across workers. Each worker starts polling, implementing its own timeout logic, retry behavior, and duplicate-result handling. The model may be strong, but the product still feels unreliable if the orchestration layer becomes inconsistent.

Gemini API Webhooks flips that structure. The application creates the job and waits. When an event occurs, the Gemini API pushes it. The receiving service verifies the event, stores an idempotency key, puts work onto an internal queue, and continues processing from the result pointer. That pulls AI work into event-driven backend architecture instead of treating it as a long model call.

Gemini API Webhooks flow

1. Create job

2. Event occurs

Success, failure, cancellation, or required user action becomes an event.

3. Signed POST

The Gemini API sends a thin payload to the webhook endpoint.

4. Internal processing

Verify the signature, enqueue the event, then process result files or the next agent step.

Google is emphasizing security headers

One of the more important details is that Google does not describe this as only a callback URL. The announcement and documentation both say Gemini API Webhooks follow the Standard Webhooks specification. Each request includes webhook-signature, webhook-id, and webhook-timestamp headers.

Those headers do different jobs in production. webhook-signature lets the receiver verify that the payload came from the expected sender. webhook-id gives the application a deduplication key when the same event arrives more than once. webhook-timestamp helps reject old payloads that could be replayed later. The Gemini documentation recommends rejecting payloads whose timestamp is more than five minutes old.

That matters more for AI jobs than it might seem. A Batch result or video generation output is not just a notification. It may represent paid compute, internal documents, customer data, code analysis, or a pointer to generated media. If the endpoint does not verify requests, an attacker could send fake completion events and move internal workflows forward. If duplicate events are not handled properly, a system could store the same output twice, start the next agent step twice, or create billing and notification errors.

The documentation separates static and dynamic webhooks. A static webhook is configured at the project level and verified with an HMAC signing secret. A dynamic webhook is included on a specific request, can route work to job-specific endpoints, and uses asymmetric signing through Google's public JWKS. That is not a cosmetic distinction. It gives teams different operating models. A company-wide Batch pipeline can use a static webhook. A product that creates per-customer or per-agent-run callbacks may use dynamic webhooks when that extra routing flexibility is worth the complexity.

Type	Static webhook	Dynamic webhook
Configured in	Project-level endpoint	Per-request `webhook_config`
Verification	HMAC signing secret	JWKS based on Google's public certificates
Best fit	Shared Batch pipelines, Slack alerts, database sync	Per-agent-run callbacks, customer endpoints, job-specific routing

The event catalog reveals the real shape of AI apps

The event catalog in the Gemini API Webhooks documentation is also telling. This is not just one completed event. Batch jobs can emit batch.succeeded, batch.cancelled, batch.expired, and batch.failed. The Interactions API includes interaction.requires_action, interaction.completed, interaction.failed, and interaction.cancelled. Video generation includes video.generated.

The most interesting event is interaction.requires_action. Agentic applications do not always finish automatically. They may need a function call, a user decision, or approval from another system. By making that state a webhook event, Google is showing that Gemini API work is expanding toward a stateful interaction runtime, not only a stateless model invocation.

Another important design choice is the thin payload model. The documentation says the webhook payload avoids sending the full result and instead sends status plus a pointer to the output. An example batch.succeeded payload includes an output_file_uri. In other words, the webhook is closer to a control-plane signal than a data transport channel.

That is the right shape for long-running AI jobs. Large video outputs or high-volume Batch results do not belong inside an HTTP callback payload. Sending the full result directly increases timeout risk, network cost, retry cost, and failure ambiguity. A pointer keeps delivery light and lets the receiving application process results asynchronously through a worker. As AI infrastructure grows, separating the event control plane from the result data plane becomes increasingly important.

Long-running AI work is now a backend design problem

The immediate benefit for developers is obvious. They can reduce polling workers, learn about completion faster, and let Gemini's delivery retry mechanism handle temporary endpoint failures. The documentation says webhook endpoints should return a 2xx response within a few seconds. Failed deliveries are retried with exponential backoff for up to 24 hours.

That also creates new responsibilities. The endpoint should verify the event, record a deduplication key, enqueue internal work, and respond quickly. If the callback handler downloads result files, updates databases, invokes another agent, and sends user notifications before returning, it increases timeout and duplicate-processing risk. A webhook handler should be thin.

The phrase at-least-once delivery also deserves attention. It means an event should arrive at least once. It does not mean it will arrive exactly once. The same event can appear more than once. Developers need idempotency keyed by webhook-id, the resource identifier, or both. Saving Batch output should check whether that job has already been processed. Starting the next agent step should check whether that run is already in progress.

This is where AI applications meet ordinary distributed systems problems. Idempotency keys, dead-letter queues, retry policies, rate limits, audit logs, and billing caps start to matter as much as prompt quality. The longer an agent works, the less a good prompt is enough. The system must know whether the agent finished, failed, needs user input, where results live, and what to do if the same event arrives again.

Gemini File Search makes the direction clearer

Google announced the multimodal expansion of Gemini API File Search the next day, on May 5, 2026. File Search can process images and text together, supports custom metadata filters, and provides page-level citations. Placed next to Webhooks, that release makes Google's developer strategy easier to read.

File Search turns the knowledge layer of an AI app into a managed Gemini API service. Webhooks turn the state and completion events of long-running work into managed events. One is the retrieval plane. The other is the execution plane. Developers can build both themselves, but Google is pulling more of the infrastructure around model calls into the Gemini platform.

That is an important direction for AI platform competition. In 2024 and 2025, the market focused heavily on model capability, context windows, token prices, and multimodal quality. In 2026, the fight is moving into operations. Who can let agents run longer? Who can make results verifiable? Who can make cost, state, delivery, and failure handling easier to manage? Those questions decide whether teams adopt a platform for real products rather than demos.

Gemini API Webhooks is Google's way of making one premise explicit: AI jobs can be long-running, and long-running jobs should be operated through events. It may look like a small API update, but it is a signal about the default structure of AI applications.

Community response is quiet but practical

This announcement did not create a large public debate in the way a frontier model launch might. I did not find a major Hacker News discussion during the research pass. The more concrete response came from implementation-focused posts. DevelopersIO in Japan published a walkthrough that receives Gemini API Webhooks through an AWS Lambda Function URL and inspects webhook-id, webhook-timestamp, and webhook-signature headers. That kind of response is more practical than hype. A webhook is not something you only read about. You attach it to infrastructure and test its delivery behavior.

At the same time, recent posts in Google Cloud and Gemini API communities have repeatedly discussed exposed API keys and unexpected Gemini API bills. That does not point to a defect in Webhooks. It does provide useful context. When long-running AI work becomes automated through external events, key restrictions, billing guardrails, replay protection, and idempotency become part of the feature.

Teams should not treat this update only as "less polling." A webhook endpoint must authenticate and verify signatures. Gemini API keys should be scoped and constrained. Expensive Batch or video generation workloads need usage caps and alerts. Event-driven automation can make a system faster. If the wrong request or exposed key enters that system, it can also turn mistakes into cost faster.

What development teams should change

Teams already using the Gemini API should start by looking for long-running jobs. If a product uses the Batch API for evaluation datasets, runs Deep Research-style workflows, or processes video generation in the background, replacing polling workers with webhooks is a natural next step.

The recommended structure is straightforward. A public webhook endpoint preserves the raw body and verifies the signature. It rejects stale timestamps. It records the event id and resource id in an idempotency store. It puts a message on an internal queue. Then it returns a 2xx response quickly. Result downloads, database updates, user notifications, and follow-up agent calls happen in workers.

Result pointers should be trusted only as pointers. The fact that Gemini sent batch.succeeded and the fact that your system successfully fetched, parsed, and stored the output are separate states. Mixing those states makes incident analysis harder. External delivery and internal processing should be tracked independently.

Dynamic webhooks also need caution. They are useful, but per-job endpoints can complicate endpoint management. Products that accept customer-provided callback URLs have to think about SSRF, allowlists, domain verification, and retry storms. For a simple internal pipeline, a single static webhook plus internal metadata routing may be easier to operate.

Model APIs are becoming operations APIs

Gemini API Webhooks is not a glamorous model launch. There are no benchmark numbers, no new reasoning capability, and no claim of a larger context window. For teams building real AI products, that is exactly why it matters. As long-running AI work grows, a model API becomes an operations API.

Google has made it easier to connect Gemini's long-running jobs to event-driven backends. It has also sent a clear message: agentic AI applications will not scale well on polling loops alone. Signed events, thin payloads, idempotency, retry behavior, queues, audit logs, and billing controls become core parts of the stack.

The main takeaway is not simply that developers can stop polling. It is that there is now an official path for treating AI work as a stateful workflow. If Gemini API File Search provides a managed knowledge layer and Gemini API Webhooks provides a managed execution-event layer, the next platform fight is likely to move beyond model quality toward who provides the more reliable agent runtime.