What is SDC? The Semantic Data Charter Explained

Reading time: ~12 minutes | Level: Beginner Prerequisites: None

What You'll Learn

  • Why traditional data modeling falls short on governance, meaning, and quality
  • The three pillars of the Semantic Data Charter: Governance, Semantics, Quality
  • The core SDC4 data types and when to use each one
  • How governance, semantics, and quality work in practice
  • What output formats SDC4 produces
  • Which W3C standards SDC4 builds on

The Problem: Data Without Context

Most data systems focus on structure — tables, columns, data types — and stop there. The meaning of the data, how it should be governed, and what to do when values are missing are left implicit, scattered across documentation, or simply absent.

Consider a column called status. What does it mean? Account status? Order status? Health status? Without embedded meaning, every system that touches this data must guess or maintain its own mapping. Multiply that by hundreds of fields and dozens of systems and you get data silos, integration friction, and lost trust.

SDC4 addresses this by making data self-describing. Every data element carries its own governance metadata, semantic links, and quality rules — right in the schema, not in a separate document.

The Three Pillars of SDC4

1. Enforce Governance

Every SDC4 data model includes governance components directly in the schema:

  • Audit trails: Who created or modified the data, and when
  • Attestation: Formal sign-off that the data is correct
  • Provenance: Where the data came from (person, device, software)
  • Access control: Tags for linking to external access control systems
  • Temporal validity: When data becomes valid, when it expires, when it was recorded

These are not optional add-ons. The Reference Model defines AuditType, AttestationType, PartyType, and ParticipationType as first-class components that travel with the data.

2. Embed Meaning (Semantics)

Every data element in SDC4 can be linked to ontologies and controlled vocabularies:

  • An email field links to schema.org/email
  • A diagnosis_code field links to ICD-10 or SNOMED CT
  • A weight field links to UCUM:kg

These links are embedded in the schema definition using RDF annotations. The result: any system that encounters the data can look up its precise meaning without relying on documentation that may be outdated.

3. Mandate Quality

Real-world data is imperfect. SDC4 handles this explicitly through ExceptionalValue types:

  • unknown — the value exists but is not known
  • not-applicable — the question does not apply (e.g., pregnancy status for a male patient)
  • masked — the value is hidden for privacy or security reasons

Instead of using NULL and hoping consumers know why data is missing, SDC4 encodes the reason directly in the data. This is a foundational shift: data quality rules are part of the model, not an afterthought.

Core Data Types

SDC4 defines a rich set of extended data types (Xd* types) that go beyond standard primitives. Every type inherits from XdAnyType, which provides label, definition, temporal metadata, and access control.

Text Types

XdString — General-purpose text with rich constraints: - Regex patterns, min/max length, enumerations - Use for: names, addresses, emails, codes, descriptions

XdToken — Normalized whitespace text: - Automatic whitespace collapsing and trimming - Use for: machine-readable codes, identifiers, API tokens

Boolean Type

XdBoolean — True/false logical values: - Use for: flags, consent indicators, feature toggles

Numeric Types

XdCount — Non-negative integers with units: - Use for: item quantities, headcounts, share counts - Example: 100 shares, 42 items

XdQuantity — Decimal values with required units: - Use for: measurements, currency amounts, scientific values - Example: 120.0 mmHg, 199.99 USD, 37.2 °C

XdFloat / XdDouble — IEEE 754 floating-point: - Use for: scientific calculations, coordinates - Prefer XdQuantity for financial data to avoid rounding issues

Temporal Type

XdTemporal — Flexible date/time with variable granularity: - Full datetime, date only, time only, or partial (year-month, year) - ISO 8601 format - Use for: timestamps, birth dates, expiration dates

Ordinal Type

XdOrdinal — Ordered categories with label-code pairs: - Use for: severity levels, priority rankings, Likert scales - Example: mild=1, moderate=2, severe=3

Reference and Binary Types

XdLink — References to other models or external resources: - Use for: relationships between data models, ontology URIs

XdFile — Binary data or file references with integrity checking: - Use for: document attachments, images, encrypted files - Includes: MIME type, file size, hash verification

Structural Types

ClusterType — Container that groups related components: - Supports flexible grouping for structured data - Example: a "Blood Pressure" cluster containing Systolic and Diastolic components

XdAdapterType — Wrapper that adapts an Xd* type for inclusion in a Cluster: - Every data component inside a Cluster is wrapped in an Adapter

Type Selection Quick Guide

Is it text?              → XdString (or XdToken for codes)
Is it true/false?        → XdBoolean
Is it a whole number?    → XdCount
Is it a decimal + units? → XdQuantity
Is it a date or time?    → XdTemporal
Is it an ordered ranking?→ XdOrdinal
Is it a link/reference?  → XdLink
Is it a file/binary?     → XdFile
Is it a group of fields? → ClusterType

How Governance Works in Practice

An SDC4 data instance wraps data in a DMType root element that carries governance and provenance metadata alongside the payload:

DMType (root)
├── dm-label: "Patient Vitals"
├── dm-language: "en-US"
├── creation_timestamp: when this instance was created
├── instance_id: globally unique ID
├── data: the actual payload (Cluster of components)
├── subject: the person/entity the data is about
├── provider: who provided the data
├── audit: system/user trail
├── attestation: formal sign-off
└── workflow: state machine definition

The workflow element can carry a full state machine definition — valid states, transitions, and guard conditions — turning each data instance into a smart packet that embeds its own behavioral rules.

How Semantics Work in Practice

Semantic meaning is embedded at the schema definition level, not the instance level. When SDCStudio generates an XSD for your data model, each component's label element is fixed to an immutable value:

Schema definition (permanent):
  mc-clj5x2p4k... → label fixed="Systolic Blood Pressure"
                   → RDF annotation: SNOMED CT 271649006

Instance data (runtime):
  ms-clj5x2p4k... → xdquantity-value: 120
                   → xdquantity-units: mmHg

The mc- prefix identifies the model component (type definition). The ms- prefix identifies the model substitution (element reference). Both use CUID2 identifiers that are permanent and collision-resistant. This separation means the structure is immutable and reusable across models, while the meaning is explicitly stated and machine-readable.

How Quality Works in Practice

Every Xd type can carry an ExceptionalValue element that explains why* a value is absent or unusual:

<BloodType xsi:type="XdStringType">
  <label>Blood Type</label>
  <data/>
  <exceptionValue>unknown</exceptionValue>
  <audit>
    <createdBy>nurse</createdBy>
    <createdOn>2025-11-02T10:00:00Z</createdOn>
    <rationale>Patient unable to recall</rationale>
  </audit>
</BloodType>

This is far more informative than a bare NULL. Downstream systems can distinguish "we asked but don't know" from "the question doesn't apply" from "the value is hidden for privacy."

Standards Foundation

SDC4 is built on established W3C open standards:

Standard Role in SDC4
XML Schema (XSD) Structural backbone — all models defined as XSD restrictions
RDF Semantic annotations — data maps to RDF triples
OWL Ontology alignment — semantic models expressible in OWL
SPARQL Querying — SDC4 data queryable via SPARQL
SHACL Validation — constraint shapes for RDF data

By building on these standards, SDC4 works with existing tooling: XML validators, triple stores, SPARQL endpoints, and knowledge graph platforms.

Output Formats

An SDC4 data model can produce multiple output formats:

Format Purpose
XSD XML Schema for structural validation
XML Instance documents with sample data
JSON Instance data for API development
JSON-LD Semantic schema for linked data integration
HTML Human-readable documentation
RDF Triples for semantic web and knowledge graphs
SHACL Shape constraints for RDF validation
GQL CREATE statements for property graph databases

One model definition, eight output formats — each serving a different use case while maintaining the same underlying semantics and governance.

Data Longevity

A distinctive property of SDC4 is migration avoidance. Because structural components use permanent CUID identifiers and semantic meaning is fixed in the schema, data created today remains interpretable indefinitely. New applications can read historical SDC4 data without migration scripts, because:

  • The ms-<cuid> element always has the same structure
  • The mc-<cuid> type definition is immutable
  • The label element provides explicit, permanent meaning

This significantly reduces long-term data management costs.

Summary

SDC4 is a framework for creating data models that are:

  • Self-governing: Audit trails, attestation, and provenance travel with the data
  • Self-describing: Semantic links to ontologies make meaning explicit and machine-readable
  • Quality-aware: Missing and exceptional data is handled explicitly, not silently
  • Standards-based: Built on W3C XSD, RDF, OWL, SPARQL, and SHACL
  • Multi-format: One model produces XSD, XML, JSON, RDF, SHACL, GQL, and HTML
  • Long-lived: Immutable component identifiers and fixed labels avoid costly data migrations

Next Tutorial

SDCStudio Overview — From CSV to Published Schema — See how SDCStudio uses AI to turn a data file into a complete SDC4-compliant model.