What is SDC? The Semantic Data Charter Explained
Reading time: ~12 minutes | Level: Beginner Prerequisites: None
What You'll Learn
- Why traditional data modeling falls short on governance, meaning, and quality
- The three pillars of the Semantic Data Charter: Governance, Semantics, Quality
- The core SDC4 data types and when to use each one
- How governance, semantics, and quality work in practice
- What output formats SDC4 produces
- Which W3C standards SDC4 builds on
The Problem: Data Without Context
Most data systems focus on structure — tables, columns, data types — and stop there. The meaning of the data, how it should be governed, and what to do when values are missing are left implicit, scattered across documentation, or simply absent.
Consider a column called status. What does it mean? Account status? Order status? Health status? Without embedded meaning, every system that touches this data must guess or maintain its own mapping. Multiply that by hundreds of fields and dozens of systems and you get data silos, integration friction, and lost trust.
SDC4 addresses this by making data self-describing. Every data element carries its own governance metadata, semantic links, and quality rules — right in the schema, not in a separate document.
The Three Pillars of SDC4
1. Enforce Governance
Every SDC4 data model includes governance components directly in the schema:
- Audit trails: Who created or modified the data, and when
- Attestation: Formal sign-off that the data is correct
- Provenance: Where the data came from (person, device, software)
- Access control: Tags for linking to external access control systems
- Temporal validity: When data becomes valid, when it expires, when it was recorded
These are not optional add-ons. The Reference Model defines AuditType, AttestationType, PartyType, and ParticipationType as first-class components that travel with the data.
2. Embed Meaning (Semantics)
Every data element in SDC4 can be linked to ontologies and controlled vocabularies:
- An
emailfield links toschema.org/email - A
diagnosis_codefield links toICD-10orSNOMED CT - A
weightfield links toUCUM:kg
These links are embedded in the schema definition using RDF annotations. The result: any system that encounters the data can look up its precise meaning without relying on documentation that may be outdated.
3. Mandate Quality
Real-world data is imperfect. SDC4 handles this explicitly through ExceptionalValue types:
unknown— the value exists but is not knownnot-applicable— the question does not apply (e.g., pregnancy status for a male patient)masked— the value is hidden for privacy or security reasons
Instead of using NULL and hoping consumers know why data is missing, SDC4 encodes the reason directly in the data. This is a foundational shift: data quality rules are part of the model, not an afterthought.
Core Data Types
SDC4 defines a rich set of extended data types (Xd* types) that go beyond standard primitives. Every type inherits from XdAnyType, which provides label, definition, temporal metadata, and access control.
Text Types
XdString — General-purpose text with rich constraints: - Regex patterns, min/max length, enumerations - Use for: names, addresses, emails, codes, descriptions
XdToken — Normalized whitespace text: - Automatic whitespace collapsing and trimming - Use for: machine-readable codes, identifiers, API tokens
Boolean Type
XdBoolean — True/false logical values: - Use for: flags, consent indicators, feature toggles
Numeric Types
XdCount — Non-negative integers with units: - Use for: item quantities, headcounts, share counts - Example: 100 shares, 42 items
XdQuantity — Decimal values with required units: - Use for: measurements, currency amounts, scientific values - Example: 120.0 mmHg, 199.99 USD, 37.2 °C
XdFloat / XdDouble — IEEE 754 floating-point: - Use for: scientific calculations, coordinates - Prefer XdQuantity for financial data to avoid rounding issues
Temporal Type
XdTemporal — Flexible date/time with variable granularity: - Full datetime, date only, time only, or partial (year-month, year) - ISO 8601 format - Use for: timestamps, birth dates, expiration dates
Ordinal Type
XdOrdinal — Ordered categories with label-code pairs: - Use for: severity levels, priority rankings, Likert scales - Example: mild=1, moderate=2, severe=3
Reference and Binary Types
XdLink — References to other models or external resources: - Use for: relationships between data models, ontology URIs
XdFile — Binary data or file references with integrity checking: - Use for: document attachments, images, encrypted files - Includes: MIME type, file size, hash verification
Structural Types
ClusterType — Container that groups related components: - Supports flexible grouping for structured data - Example: a "Blood Pressure" cluster containing Systolic and Diastolic components
XdAdapterType — Wrapper that adapts an Xd* type for inclusion in a Cluster: - Every data component inside a Cluster is wrapped in an Adapter
Type Selection Quick Guide
Is it text? → XdString (or XdToken for codes)
Is it true/false? → XdBoolean
Is it a whole number? → XdCount
Is it a decimal + units? → XdQuantity
Is it a date or time? → XdTemporal
Is it an ordered ranking?→ XdOrdinal
Is it a link/reference? → XdLink
Is it a file/binary? → XdFile
Is it a group of fields? → ClusterType
How Governance Works in Practice
An SDC4 data instance wraps data in a DMType root element that carries governance and provenance metadata alongside the payload:
DMType (root)
├── dm-label: "Patient Vitals"
├── dm-language: "en-US"
├── creation_timestamp: when this instance was created
├── instance_id: globally unique ID
├── data: the actual payload (Cluster of components)
├── subject: the person/entity the data is about
├── provider: who provided the data
├── audit: system/user trail
├── attestation: formal sign-off
└── workflow: state machine definition
The workflow element can carry a full state machine definition — valid states, transitions, and guard conditions — turning each data instance into a smart packet that embeds its own behavioral rules.
How Semantics Work in Practice
Semantic meaning is embedded at the schema definition level, not the instance level. When SDCStudio generates an XSD for your data model, each component's label element is fixed to an immutable value:
Schema definition (permanent):
mc-clj5x2p4k... → label fixed="Systolic Blood Pressure"
→ RDF annotation: SNOMED CT 271649006
Instance data (runtime):
ms-clj5x2p4k... → xdquantity-value: 120
→ xdquantity-units: mmHg
The mc- prefix identifies the model component (type definition). The ms- prefix identifies the model substitution (element reference). Both use CUID2 identifiers that are permanent and collision-resistant. This separation means the structure is immutable and reusable across models, while the meaning is explicitly stated and machine-readable.
How Quality Works in Practice
Every Xd type can carry an ExceptionalValue element that explains why* a value is absent or unusual:
<BloodType xsi:type="XdStringType">
<label>Blood Type</label>
<data/>
<exceptionValue>unknown</exceptionValue>
<audit>
<createdBy>nurse</createdBy>
<createdOn>2025-11-02T10:00:00Z</createdOn>
<rationale>Patient unable to recall</rationale>
</audit>
</BloodType>
This is far more informative than a bare NULL. Downstream systems can distinguish "we asked but don't know" from "the question doesn't apply" from "the value is hidden for privacy."
Standards Foundation
SDC4 is built on established W3C open standards:
| Standard | Role in SDC4 |
|---|---|
| XML Schema (XSD) | Structural backbone — all models defined as XSD restrictions |
| RDF | Semantic annotations — data maps to RDF triples |
| OWL | Ontology alignment — semantic models expressible in OWL |
| SPARQL | Querying — SDC4 data queryable via SPARQL |
| SHACL | Validation — constraint shapes for RDF data |
By building on these standards, SDC4 works with existing tooling: XML validators, triple stores, SPARQL endpoints, and knowledge graph platforms.
Output Formats
An SDC4 data model can produce multiple output formats:
| Format | Purpose |
|---|---|
| XSD | XML Schema for structural validation |
| XML | Instance documents with sample data |
| JSON | Instance data for API development |
| JSON-LD | Semantic schema for linked data integration |
| HTML | Human-readable documentation |
| RDF | Triples for semantic web and knowledge graphs |
| SHACL | Shape constraints for RDF validation |
| GQL | CREATE statements for property graph databases |
One model definition, eight output formats — each serving a different use case while maintaining the same underlying semantics and governance.
Data Longevity
A distinctive property of SDC4 is migration avoidance. Because structural components use permanent CUID identifiers and semantic meaning is fixed in the schema, data created today remains interpretable indefinitely. New applications can read historical SDC4 data without migration scripts, because:
- The
ms-<cuid>element always has the same structure - The
mc-<cuid>type definition is immutable - The
labelelement provides explicit, permanent meaning
This significantly reduces long-term data management costs.
Summary
SDC4 is a framework for creating data models that are:
- Self-governing: Audit trails, attestation, and provenance travel with the data
- Self-describing: Semantic links to ontologies make meaning explicit and machine-readable
- Quality-aware: Missing and exceptional data is handled explicitly, not silently
- Standards-based: Built on W3C XSD, RDF, OWL, SPARQL, and SHACL
- Multi-format: One model produces XSD, XML, JSON, RDF, SHACL, GQL, and HTML
- Long-lived: Immutable component identifiers and fixed labels avoid costly data migrations
Next Tutorial
SDCStudio Overview — From CSV to Published Schema — See how SDCStudio uses AI to turn a data file into a complete SDC4-compliant model.