Module 3: Components, Clusters, and Publishing

Duration: ~75 minutes self-paced (includes hands-on lab time) Prerequisites: Modules 1-2 Learning objectives: - Understand the SDC4 type hierarchy and when to use each data type - Create and edit components using the Component Wizard - Organize components into meaningful clusters - Add semantic links connecting components to ontology concepts - Publish models and understand versioning - Generate and deploy applications from published models - Understand the relationship between components and the minimum knowledge modeling principle

3.1 The SDC4 Type Hierarchy

Every data element in SDCStudio is a component — a single field with a type, a label, validation rules, and semantic links. All SDC4 types inherit from a common base (XdAnyType) which provides shared metadata: label, definition, temporal validity fields, and access control tags.

XdAnyType (base — label, definition, temporal metadata, access control)
├── XdStringType          — general text
├── XdTokenType           — normalized whitespace text
├── XdBooleanType         — true/false
├── XdOrderedType (abstract — ordering semantics)
│   ├── XdOrdinalType     — ordered categories
│   └── XdQuantifiedType (abstract — units, magnitude status)
│       ├── XdCountType   — non-negative integers + units
│       ├── XdQuantityType— decimals + required units
│       ├── XdFloatType   — single-precision float
│       └── XdDoubleType  — double-precision float
├── XdTemporalType        — dates, times, partial dates
├── XdLinkType            — references to other models/resources
└── XdFileType            — binary data / file references

The key distinction: quantified types (XdCount, XdQuantity, XdFloat, XdDouble) require units. Non-quantified types (XdString, XdBoolean, XdTemporal) do not.

When to use each type

XdString — text data: names, addresses, emails, codes, descriptions. Constraints: min/max length, regex pattern, enumeration.

XdToken — machine-readable identifiers where whitespace should be collapsed and trimmed (ISO country codes, product SKUs, API tokens).

XdBoolean — binary states: yes/no, on/off, consent given/not given.

XdCount — non-negative integers representing countable quantities. Units are required even when obvious: "items," "people," "shares."

XdQuantity — decimal measurements with units. Blood pressure (mmHg), weight (kg), temperature (Celsius). This is where the blood pressure example from Module 1 becomes concrete — the unit and the value are structurally bound together.

XdTemporal — dates, times, and partial dates. Supports ISO 8601 patterns including year-only, year-month, and full datetime with timezone.

XdOrdinal — ordered categories: severity levels (low/medium/high), satisfaction scores (1-5), education levels.

XdLink — references to other models, resources, or external URIs. Used for relationships between entities.

XdFile — binary data or file references: images, documents, attachments.

Minimum knowledge modeling and type selection

The type you choose is itself a minimum knowledge modeling decision. If a field is "just text," use XdString — do not add quantification, ordering, or temporal semantics it does not need. If a field is a measurement, use XdQuantity — do not model it as XdString with "120 mmHg" as the value, because the unit constraint would be in the text, not in the structure.

The SDC4 type hierarchy is designed so that each type captures exactly what is essential about the data element and nothing more. The types are the minimum knowledge building blocks.

3.2 Creating Components with the Wizard

The Component Wizard walks you through creating a new component step by step.

Navigate to your project's Components view
Click Create Component
The wizard guides you through:
Type selection: Choose from the type hierarchy
Label and description: Human-readable name and purpose
Validation rules: Type-specific constraints (regex for XdString, units for XdQuantity, etc.)
Semantic links: Ontology concept bindings (see §3.4 below)
Review the summary and save

Editing existing components

Click on any component to open the stepped edit form:

Step 1: Basic properties (label, description, type)
Step 2: Validation rules and constraints
Step 3: Semantic links (RDF predicate-object pairs)
Step 4: Review and save

Every edit is versioned. The component's CUID2 identity remains stable across edits — only the version changes. This is the structural identity / descriptive label decoupling from Module 1 in practice: you can rename a component, change its description, even change its validation rules, and every system referencing it by CUID2 still finds the same entity.

3.3 Organizing with Clusters

A cluster is a logical grouping of related components within a data model. Clusters serve two purposes:

Organization: Group components that belong together conceptually (e.g., "Patient Demographics," "Contact Information," "Clinical Measurements")
Composition: The structure of clusters IS the composition in the minimum knowledge modeling sense — you compose minimum components into meaningful groups at use time, not by inflating individual components

Creating and managing clusters

In your Data Model view, click Add Cluster
Name the cluster (e.g., "Patient Demographics")
Add a description of what the cluster represents
Drag or assign components into the cluster

Cluster design principles

One concept per cluster: A cluster should represent one coherent domain concept, not a grab-bag of related fields
Composable across models: The same component can appear in multiple clusters across different models. The component is the reusable atom; the cluster is the molecule
Flat where possible: Avoid deeply nested clusters unless the domain genuinely requires hierarchy. One level of nesting covers 90% of use cases.
Named for the domain expert: Cluster names should be meaningful to the person who understands the data, not to the database administrator. "Blood Pressure Measurement" not "BP_CLUST_01"

3.4 Semantic Linking — Adding Meaning to Your Data

Semantic links connect your components to published ontology concepts. This is what makes the data self-describing across system boundaries.

A semantic link is an RDF predicate-object pair attached to a component:

Predicate: The relationship type (e.g., rdf:type, rdfs:subClassOf, skos:exactMatch)
Object: The ontology concept being referenced (e.g., snomed:271649006 for systolic blood pressure)

Adding semantic links in SDCStudio

Open a component for editing
Navigate to the Semantic Links step
Click Add Link
Select a predicate from the dropdown (common predicates are pre-loaded)
Search for or paste the object URI (SDCStudio searches your uploaded ontologies and built-in standards)
Save

Why semantic links matter

Without semantic links, a component called "blood_pressure" is just a label. With a semantic link to snomed:271649006, the component's meaning is machine-resolvable to the globally recognized SNOMED CT concept for systolic blood pressure. Any system that speaks SNOMED CT can verify what the component means without asking the author.

This is the W3C standards answer to Robert Vane's boundary problem: the meaning is not "inferred from relationships" — it is structurally bound to a globally resolvable ontology concept. The boundary becomes verification, not translation.

How many links does a component need?

Apply the minimum knowledge modeling principle: link to what is essential to identify the concept and distinguish it from its neighbors. A blood pressure measurement needs a SNOMED CT link. It does not need a link to every possible medical ontology that has a concept for blood pressure. One authoritative link is better than five redundant ones.

3.5 Publishing and Versioning

Publishing a model

Publishing locks the current version of a model and makes it available for output generation, component reuse, and deployment.

In your Data Model view, click Publish
Review the model summary (components, clusters, semantic links)
Confirm publication
Status changes to PUBLISHED

Versioning

Every published model is immutable — it cannot be changed after publication. To evolve a model:

Create a new version (SDCStudio provides a Create New Version action on published models)
Edit the new version
Publish the new version

Old versions remain accessible by their CUID2 + version identifier. This is why schema evolution works without data migration - old data validates against the version that was active when it was written, and new data validates against the current version. Both coexist.

What publishing generates: the context graph

When you publish a model, SDCStudio generates 8 output formats (XSD, XML, JSON, JSON-LD, HTML, RDF/Turtle, SHACL, GQL). The RDF, OWL, and SHACL outputs ARE the client's context graph. Every component you minted, every semantic link you defined, every vocabulary binding - they all become nodes and relationships in a live knowledge graph.

This is not a separate step. You did not "build a knowledge graph" as a project. You built a data model, and the graph was generated as a structural property of the modeling process. Every governance component (workflow models, attestation, party/role, provenance) is also present in the graph. Decision traces are queryable because they are structurally defined, not logged after the fact.

When clients or prospects ask about "context graphs" or "knowledge graphs" or "decision traces," this is what you deliver. The vocabulary is new. What SDC produces is not.

The governance components you publish (workflow state machines, attestation models, party/role definitions) are also consumable by runtime enforcement engines. When a client needs execution-time enforcement - where the system verifies that a state change is admissible before allowing it to happen - the models you built in SDCStudio can be handed off directly. Your modeling work becomes the contract that the execution engine enforces. See Module 7 for details on the execution handoff pattern.

3.6 Generating Applications

Once your model is published, SDCStudio can generate a complete deployable application — a Django web application with a REST API, admin interface, validation layer, and database schema, packaged as a Docker Compose deployment.

What the generated application includes

Django models: Database tables matching your data model structure
REST API: Endpoints for reading and writing data, with schema validation at the boundary
Admin interface: Django admin for data management
Validation layer: XSD 1.1 + Schematron constraints enforced at every write
Docker Compose: Complete deployment with PostgreSQL, ready to run on the client's hardware

How to generate

Navigate to your published model
Click Generate Application
Configure deployment options (lightweight vs enterprise stack)
Click Generate
Download the application bundle

The generated application runs on the client's own infrastructure — a Linux box, a private cloud VM, or anywhere Docker runs. The client's data never leaves their environment. There is no SaaS dependency for the deployed application.

This is what practitioners deploy for clients. The bespoke application is the deliverable. The per-touch stewardship model (Module 9) is how practitioners maintain and evolve it over time.

3.7 The Component Reuse Catalog

When you publish a model, the components you created are added to the SDCStudio component catalog. The next time you (or another practitioner) creates a model with similar data shapes, the catalog suggests matching existing components — which can be reused for free instead of minted as new (billable) components.

This is the mechanism behind the practitioner economics compounding: - First engagement: most components are new → real minting costs - Second engagement in the same vertical: many components match → free reuse - Fifth engagement: nearly everything matches → almost zero minting cost

The catalog is the resolution mechanism. When two systems use the same CUID2-identified component from the same catalog, meaning IS resolved to a single path. The constraint makes the shared identity enforceable at runtime.

Module 3 Exercise

Continuing with the "Atlas Legal Lab" project from Module 2:

Open the model you created from clients.csv
Create a new cluster called "Client Identity" and assign the name, email, and phone components to it
Create a new cluster called "Case Information" and assign the case_number, matter_type, and status components
Add a semantic link to the client_name component: predicate rdf:type, object schema:Person (schema.org Person)
Edit the phone component — change its validation to include a regex pattern for US phone numbers
Publish the model
Generate the XSD Schema and compare it to the one you generated in Module 2 — what changed?
(Optional) Generate the JSON-LD output and open it in a text editor. Find the CUID2 identifiers and the semantic links you added.

This exercise takes approximately 30 minutes. The goal is to experience the difference between an AI-generated model (Module 2) and a practitioner-refined model (this module).