SDC Maturity Framework v0.1

A structured model for assessing organizational readiness across six dimensions of semantic data infrastructure.


Overview

The SDC Maturity Framework assesses an organization across six dimensions on a five-level scale. Level 3 represents the on-track baseline where an average organization should be today to remain competitive and compliant. Levels below 3 indicate capability gaps. Levels above 3 indicate strategic advantage.

The framework is deliberately prescriptive about what "on-track" means. Most organizations are currently below the on-track line on most dimensions. This is not pessimism — it is the same capability overhang pattern visible across every industry adopting AI and autonomous systems.

Design principles

The framework rests on three principles that shape both how it is scored and how interventions should be sequenced:

  1. The floor constraint principle — the three foundational dimensions (Schema Integrity, Constraint Enforcement, Semantic Identity) cap the effective level of the three derived dimensions (Provenance, Interoperability, Governance). You cannot govern what you cannot identify or constrain.

  2. Two-level modeling — separate a stable reference model (slow-changing primitives) from fast-changing domain constraints. Domain modelers author constraints in their own vocabulary; the reference model guarantees those constraints compose across domains because they all bottom out in the same primitives.

  3. Minimum knowledge modeling — a model component captures only what is essential to identify the concept and distinguish it from its nearby concepts. Specificity beyond that minimum is achieved through composition with other minimum components, not by inflating any single definition. This is the discipline that makes the floor constraint achievable, the practitioner economics moat real, and deterministic graph construction possible. See training/curriculum/Module_1_Semantic_Data_Fundamentals.md §1.6 for the full treatment.

These three principles are not preferences. They are structural — they determine whether the rest of the framework holds together at all.


The Five Levels

Level Label Description
1 Ad Hoc No formal approach. Each team/system handles data independently. Institutional knowledge lives in people's heads.
2 Documented Approach exists in documents but is not enforced. Compliance is manual and inconsistent.
3 Enforced (On-Track) Rules are machine-enforced at the data layer. Non-compliant data cannot enter the system. Humans verify, machines enforce.
4 Integrated Enforcement crosses system boundaries. Data validated once is trusted everywhere in the organization.
5 Federated Enforcement crosses organizational boundaries. Data shared with external parties carries its own contract.

The Six Dimensions

1. Schema Integrity

Question: Are data structures defined, versioned, and enforced?

Level Criteria
1 No formal schemas. Data structures exist as table definitions or form fields without documentation.
2 Schemas documented in wikis or PDFs. No automated enforcement. Fields get added without review.
3 Schemas defined in machine-readable format (XSD, JSON Schema, Protobuf). Changes go through version control. Invalid data is rejected at write time.
4 Schemas are the single source of truth across all systems in the organization. A change to the schema propagates automatically to all consumers.
5 Schemas are portable and recognized across organizational boundaries. External partners consume the same schemas. Versioning guarantees non-breaking evolution.

SDC Alignment: Level 3+ requires XSD 1.1 or equivalent with structural validation. SDCStudio generates these natively.


2. Constraint Enforcement

Question: Do validation rules travel with the data?

Level Criteria
1 Validation rules exist only in application code. When an application is replaced, the rules are lost.
2 Some rules in database CHECK constraints, some in application code, some in stored procedures. No single source of truth.
3 Rules live in the schema itself. Every consumer of the data can verify compliance without access to the source system.
4 Rules are standardized across the organization. "Patient age must be non-negative" means the same thing in every system.
5 Rules are portable across organizations. A constraint violated in one system is detectable in any downstream system.

SDC Alignment: Level 3+ requires constraint binding to the schema (XSD assertions, SHACL shapes). SDC provides this at the atomic level.


3. Semantic Identity

Question: Does every data element have a stable, unambiguous identifier?

Level Criteria
1 Records identified by auto-increment integers or local sequence numbers. Identity is database-local.
2 Records identified by UUIDs or similar. Identity is globally unique but carries no semantic meaning.
3 Elements identified by stable CUIDs bound to their semantic definition. Identity survives system migration.
4 Identity is reusable across systems. The same patient, product, or part has the same identifier everywhere.
5 Identity is permanent across generations of schemas. Data modeled today remains addressable when the schema evolves.

SDC Alignment: Level 3+ requires CUID2-style sovereign identity. SDC mandates this at every data element. CUID2 is the structural identifier that the minimum knowledge modeling principle (see "Design principles" above) requires — descriptive labels are metadata that can be revised, translated, or rewritten freely without disturbing structural identity.


4. Provenance

Question: Can you trace every value back to its origin, author, and conditions?

Level Criteria
1 No audit trail. When a value is wrong, no one knows who entered it, when, or why.
2 Some audit tables for some systems. Coverage is partial. Logs may be deleted after retention periods.
3 Every value carries provenance metadata: source, author, timestamp, conditions. Append-only audit is enforced.
4 Provenance chains across systems. A derived value can be traced back through every transformation.
5 Provenance is verifiable cryptographically. Any consumer can confirm the lineage without trusting the source system.

SDC Alignment: Level 3+ requires mandatory provenance fields in every data element. Level 5 adds digital signing (VaaS).


5. Interoperability

Question: Can your data cross organizational boundaries without reinterpretation?

Level Criteria
1 Data exports are CSV dumps. Receivers interpret column meanings from context or phone calls.
2 Exports include documentation. Receivers use the documentation to build one-off import scripts.
3 Exports are self-describing. Schema, constraints, and vocabulary are bound to the data. Receivers verify without interpretation.
4 Data flows between partner organizations using shared vocabulary. Integration requires no custom mapping code.
5 Data flows across industries and jurisdictions. Healthcare data used in finance retains its semantic identity.

SDC Alignment: Level 3+ requires semantic binding to ontologies or controlled vocabularies. Level 5 leverages SDC's reference model as the shared substrate.


6. Governance

Question: Are data quality rules documented, enforced, and auditable?

Level Criteria
1 No formal governance. Data quality is a collective belief about how things should work.
2 Governance policies exist in a document. Compliance is checked during audits, not continuously.
3 Governance rules are enforced automatically at the data layer. Non-compliance is impossible, not just discouraged.
4 Governance scales without human bottlenecks. Domain teams operate within the rules without requiring central team approval.
5 Governance is verifiable externally. Regulators, partners, and customers can confirm compliance without trusting your internal processes.

SDC Alignment: Level 3+ requires governance-as-schema. Level 5 leverages SDCStudio's assembly framework and VaaS signing.


The Floor Constraint Principle

The six dimensions are not independent. Schema Integrity, Constraint Enforcement, and Semantic Identity form the floor constraint that caps every other dimension. An organization cannot achieve Level 3 in Provenance, Interoperability, or Governance if it is below Level 3 in the first three dimensions.

Example: An organization with Level 2 Schema Integrity cannot achieve Level 3 Provenance. If the schema does not formally define what fields exist, there is no stable target for provenance metadata to attach to.

Practitioners should prioritize bringing an organization to Level 3 in Schema Integrity, Constraint Enforcement, and Semantic Identity before investing in higher levels on the other dimensions.


Assessment Process

A full assessment consists of:

  1. Structured interview with technical leads and business stakeholders (~90 minutes)
  2. Artifact review - schemas, audit logs, integration documentation, governance policies
  3. Sample data inspection - introspect representative datasources using SDC Agents SMB (optional but recommended)
  4. Scoring - assign 1-5 rating per dimension with evidence
  5. Gap analysis - identify the floor constraint violations that block other dimensions
  6. Recommendation - prescribe specific interventions using the SDC ecosystem or alternative approaches

A quick self-assessment quiz (18 questions, ~10 minutes) can produce a preliminary map suitable for initial client conversations. The full assessment is recommended before committing to a remediation engagement.


Visualization

The output of an assessment is a radar chart showing the organization's current score on each of the six dimensions, overlaid with:

  • The on-track baseline (Level 3 on all dimensions)
  • The industry average (based on collected data — initially a practitioner estimate, refined over time as assessment data accumulates)
  • The target state after the proposed remediation

This visual makes it immediately clear where the organization stands, where it needs to go, and how much the gap matters.


Versioning

This framework is versioned separately from SDC itself. The v0.1 designation reflects that this is a first draft to be refined with practitioner and client feedback before reaching a stable v1.0.