Reference Case Study: The $47 Billion Integration Problem

Used in: Module 1 (Semantic Data Fundamentals) Purpose: To make the abstract claim "data quality is the floor constraint" concrete and economically visible to an SMB client in 90 minutes.

The headline figure

A 2022 industry analysis estimated that US healthcare alone spends approximately $47 billion per year on data integration projects. That figure includes ETL development, master data management, interface engines, custom mapping layers, integration consulting, and the staff time spent reconciling records across systems that were supposed to interoperate but did not.

The figure is contested in its details. Different studies produce different totals depending on what is counted. The lower-bound estimates are in the $30B range; the upper-bound estimates exceed $60B. The exact number does not matter for the case study. What matters is the order of magnitude and what it tells us about where the money goes.

Where the money goes

Roughly speaking, integration spending is consumed by four activities:

Building pipes: ETL jobs, message brokers, API gateways, file drops. This is the syntactic layer — moving bytes from one system to another. It is the work that vendors and practitioners are best at, and it is the work that gets quoted in proposals.
Mapping fields: Deciding that pt_dob in System A means the same thing as birth_dt in System B. Then deciding what to do when System A uses YYYY-MM-DD and System B uses Unix timestamps. Then handling the edge case where one system stores patient age and the other stores birth date.
Reconciling records: Discovering that "John Smith" in the EHR is "Jonathon Smith" in the billing system and "J. Smith" in the lab feed. Building (and rebuilding) the master patient index that ties them together. Re-running the reconciliation when the rules change.
Fixing breakage: A field is renamed in one system. A new code is added to a vocabulary. A timezone changes. A merger introduces a new EHR. Each event sends ripples through every downstream integration that depended on the old shape.

Activities 1-2 are typically scoped and budgeted. Activities 3-4 are not. They become operational overhead that grows quietly year over year.

What is being purchased

Most integration spending purchases syntactic interoperability: the bytes flow. A nurse can pull up a patient record. A claim can be submitted. A lab result can post back to the EHR. From the user's perspective, the systems "talk to each other."

What is not purchased is semantic interoperability: a shared, defensible understanding of what those bytes mean. The mapping layer carries the semantics, but the mapping layer is bespoke per project, lives in code or configuration that nobody wants to touch, and is rebuilt from scratch the next time a new system is added.

The result is that every healthcare organization in the country is paying for the same conceptual problem to be re-solved by different vendors using different mappings, with no shared substrate. The aggregate cost is the $47B figure.

Why this is not a vendor problem

It is tempting to blame the vendors for not solving this. The vendors are not the problem. The vendors are responding to what their customers ask for, which is "make these two systems talk." Customers do not ask for "build a shared semantic substrate that will make the next ten integrations cheaper" because that is not what shows up in this quarter's budget.

The problem is structural. As long as semantic interoperability is purchased one project at a time, it will be rebuilt one project at a time. The fix requires a substrate that exists outside any individual project — a reference model, stable identifiers, shared constraints — that every project can attach to instead of recreating.

Why an SMB should care about a $47B healthcare number

The SMB client is not facing $47B of integration cost. They are facing $20K, or $80K, or maybe $200K. But they are facing it for the same structural reason. The two-system integration that failed in 2022 (Atlas Legal's failed Clio↔QuickBooks attempt is the canonical example in our curriculum) failed because the underlying entities had no shared identifier and no shared constraint vocabulary. The same failure mode operates at every scale.

The lesson the SMB should take away is not "we are like a hospital." It is "the reason this kept failing is the same reason it keeps failing everywhere, and there is now a way to fix it that is small enough to fit our budget."

Using this case study in a client conversation

In Module 1 you learn the framework. In Session 1 of a client engagement you may have 2-3 minutes to motivate why the client should care about something as abstract as "semantic data infrastructure." This case study compresses to:

"There is a number that gets thrown around in healthcare — $47 billion a year on data integration that mostly delivers pipes without meaning. Every project in that number is the same failure that broke your Clio integration in 2022. The same failure mode at a different scale. The reason it keeps failing is that nobody is building the layer underneath. We finally have an open standard for that layer, and it is small enough to put on a single Linux box in your office. That is what I want to walk you through."

The number opens the door. The framework walks through it.

Sources and caveats

The $47B figure circulates in industry analyst presentations and trade press; it is not from a single peer-reviewed study. Trainees should not cite it as a hard statistic to skeptical or technical audiences without adding "industry estimates range from approximately $30B to $60B annually." The point is the order of magnitude and the structural cause, not the precision.

Similar figures exist for other sectors:

US financial services data integration spend is estimated at $25-40B/year
Manufacturing integration (ERP, MES, PLM, supply chain) is estimated at $15-30B/year
Government and public sector integration spend is estimated at $20-50B/year depending on what is included

Use whichever sector best matches your client's frame of reference. The structural argument is the same: integration spending is large, recurring, and primarily purchases syntactic interoperability that has to be rebuilt every time. The Maturity Map exists to make this visible.