642M Business Graph Shows the Agent Data Bottleneck
D&B rebuilding its 642 million-company graph for AI agents shows why enterprise automation now depends on verified business identity, not only better models.
- What happened: D&B has surfaced how it rebuilt data on 642 million businesses into a
Commercial Graphthat AI agents can query and act on.- Its official survey says 97% of companies are running AI initiatives, while only 5% say their data is ready enough to support them.
- Why it matters: The next enterprise agent bottleneck is not prompt wording. It is a verified layer that connects business identity, ownership, risk, and system records to the same real-world entity.
- Builder impact: KYC/KYB, supplier risk, and audit documentation are moving into agent workflows through integrations such as Claude MCP.
- Without permissions, freshness, explainability, and audit logs, agents may only make responsibility harder to trace while making automation faster.
Dun & Bradstreet, the long-running business data company better known as D&B, is showing a useful shift in the enterprise AI market. VentureBeat reported on May 22, 2026 that D&B had rebuilt its database of 642 million companies and relationships so AI agents could use it more directly. At first glance, that sounds like a database modernization story. For AI builders, the signal is larger: agent competition is moving below models, context windows, and tool calls into the data layer that tells a system exactly who or what it is acting on.
Most enterprise agent demos follow a familiar path. A user asks, "Onboard this company." The agent reads documents, searches the web, updates a CRM, and writes a risk report. In a real enterprise system, that simple sentence becomes complicated almost immediately. The same company can appear under different names in CRM, ERP, procurement, payment systems, sanctions lists, and corporate registry data. A branch, subsidiary, ultimate parent, trading counterparty, supplier, and beneficial owner may all matter to the same decision. Human analysts often resolve that context by switching screens and applying experience. An agent does not reliably have that implicit knowledge unless the data layer gives it a stable identity map.
That is the problem D&B is trying to frame. In its May 4, 2026 AI Momentum Survey, the company said enterprise AI adoption had already moved beyond experimentation, while data readiness had not caught up. The survey covered 10,000 businesses across 32 countries. D&B said 97% of respondents were pursuing AI initiatives, and 56% expected to increase AI investment over the next 12 months. But only 5% said their data was sufficiently prepared to support AI.
That combination of numbers is a useful way to read the current AI infrastructure market. Model quality keeps improving, and agent frameworks are rapidly absorbing file systems, browsers, code execution, and MCP servers. But inside companies, the core question is often less "what can the agent do?" and more "which object is the agent allowed to act on?" If a new customer is not clearly tied to a real legal entity, a sanctions exposure, an ultimate parent, or an existing internal record under another name, the agent can make a wrong decision with confidence.
D&B calls the missing layer a "verified commercial identity foundation." The company created the D-U-N-S Number in 1963 and describes it as a global identifier for commercial entities. Its Commercial Graph uses that identifier to connect business identity and relationships across systems. As marketing language, this can sound like ordinary master data management. In the agent era, the meaning changes. Old MDM projects mostly cleaned up reports, dashboards, and human workflows. Now, identity resolution becomes a precondition for AI systems that call APIs, generate documents, trigger approvals, and move a workflow forward.
D&B's Anthropic integration makes that shift easier to see. On May 5, 2026, D&B announced risk and compliance workflows for Claude. The core idea is to connect the D&B Commercial Graph to Claude through an MCP server so financial institutions can automate KYC/KYB onboarding, ownership checks, global third-party risk assessment, and risk decision documentation. D&B says the integration can help verify new legal-entity customers in seconds and create auditable documents.
The important word is not just "seconds." It is "auditable." Speed is now a standard agent promise. In regulated industries, speed alone is not enough. Teams need to know why a company was approved, which data supported the decision, which risk signals were considered, where a human reviewed the output, and whether the same decision can be reconstructed later. D&B's announcement is therefore not simply about putting more data inside Claude. It is about giving a model verified business context and decision support before it acts.

Recent enterprise agent announcements show the same pattern from different angles. Google has emphasized sandboxes and execution environments around Managed Agents and Antigravity. GitHub and OpenAI-aligned coding agents talk about work sessions, pull requests, logs, and permission controls. Anthropic highlights MCP, connectors, self-hosted sandboxes, and enterprise data access. D&B occupies a different position. It is not primarily selling an agent runtime. It is selling a reference layer that helps agents identify and reason about real-world businesses.
This also connects to the limits of document-first RAG. In 2024 and 2025, the default prescription for many enterprise AI projects was to chunk documents, embed them, retrieve relevant passages, and place them in the model context. That still matters. But for KYC, supply-chain risk, credit, procurement, and B2B sales, text similarity is not enough. A system has to know whether "Apple" means Apple Inc., a local reseller, an affiliate, a nickname in a customer's notes, or a supplier record imported years ago. If data with the same address, tax ID, domain, or parent relationship is scattered across systems, retrieval can look plausible while the decision stays unstable.
The D&B case is more concrete than the slogan "RAG needs graphs." An acting agent needs at least three things. First, it needs entity resolution, so records from multiple systems can point to the same business. Second, it needs a relationship graph, so subsidiaries, parents, suppliers, beneficial owners, and sanctions exposure can be followed. Third, it needs policy and auditability, so the system records which identity, relationship, and risk data informed a decision. Vector databases are strong at finding relevant text fragments. A Commercial Graph-style layer fixes the target of the action.
| Bottleneck | Document RAG approach | Business identity graph |
|---|---|---|
| Target identification | Searches similar text and metadata | Connects companies, branches, and ownership relationships through unique identifiers |
| Risk decisioning | Lets the model infer from retrieved documents | Combines verified data with predefined risk logic |
| Auditability | Depends on prompts and retrieval logs | Tracks the identity, relationship, and risk basis used in the decision |
The barriers in D&B's survey point in the same direction. Half of companies cited limited data access as a major obstacle. D&B also reported privacy and compliance risks, data quality and integrity, and lack of integration between systems as recurring blockers. These are not problems a smarter model solves on its own. The agent must be able to access the needed data, that data must be accurate, it must mean the same thing across systems, and sensitive actions need a responsible approval path.
For development teams, the practical questions are sharp. If the agent you are building can move across CRM, billing, procurement, warehouse, and support systems, how does it identify the same customer or supplier? Is the source of truth a name string, email domain, internal ID, external identifier, or a combination? When a model says two records are the same company, can a human inspect the basis for that merge? If a bad merge leads to automatic approval or automatic blocking, can the organization unwind the decision and understand where it came from?
Those questions become especially important in the MCP ecosystem. MCP standardizes tool connections, but the existence of a tool connection does not guarantee data quality. Claude reading the D&B Commercial Graph through an MCP server is an example of "tool calling" paired with verified domain data. Many companies will wrap internal data sources as MCP servers. Wrapping alone will not be enough. Agent-facing data sources need identifier policy, permission scope, field semantics, freshness guarantees, and audit logs designed with automation in mind.
None of this means D&B's approach automatically becomes the standard. D&B naturally interprets the market through its own data assets and identifier system. Moody's, LexisNexis, S&P, PitchBook, OpenCorporates, and vertical data providers can all pursue adjacent positions. Large enterprises already have MDM, customer data platforms, data catalogs, knowledge graphs, and data quality tools. Many of those products will be reintroduced as "agent-ready data" layers. Builders should look past the AI label and inspect the API contract: which identifiers are returned, how conflicts are represented, whether evidence is included, and whether human review state can be recorded.
Community reaction appears limited so far. As of May 22, 2026, the D&B story did not appear to have turned into a large discussion on Hacker News or GeekNews. A Reddit summary was shared at small scale. That muted reaction does not make the infrastructure shift less important. VentureBeat's same-week coverage around agent memory, RAG limitations, AI coding failures, and Google-style managed agent execution points to a broader theme: production AI is increasingly constrained by the operational layer outside the model.
The shortest reading of this news is that enterprise AI agents are moving from "models that speak well" toward automation systems that act on verified real-world objects. In that transition, a database is no longer just a back-office store that humans search. It becomes identity infrastructure an agent consults before deciding who it is dealing with, what it can approve, and which risks it is accepting.
That is why the 642 million-business graph matters as more than a scale claim. The gap between 97% AI initiative adoption and 5% data readiness is the current enterprise AI bottleneck. The companies that close that gap may be model labs, cloud platforms, or older data vendors like D&B. The durable lesson is simpler: as agents take on real workflows, the decisive question shifts from "what did the model generate?" to "who did it act on, and on what evidence?"