Module 3: Components, Clusters, and Publishing
Duration: ~75 minutes self-paced (includes hands-on lab time) Prerequisites: Modules 1-2 Learning objectives: - Understand the SDC4 type hierarchy and when to use each data type - Create and edit components using the Component Wizard - Organize components into meaningful clusters - Add semantic links connecting components to ontology concepts - Publish models and understand versioning - Generate and deploy applications from published models - Understand the relationship between components and the minimum knowledge modeling principle
3.1 The SDC4 Type Hierarchy
Every data element in SDCStudio is a component — a single field with a type, a label, validation rules, and semantic links. All SDC4 types inherit from a common base (XdAnyType) which provides shared metadata: label, definition, temporal validity fields, and access control tags.
XdAnyType (base — label, definition, temporal metadata, access control)
├── XdStringType — general text
├── XdTokenType — normalized whitespace text
├── XdBooleanType — true/false
├── XdOrderedType (abstract — ordering semantics)
│ ├── XdOrdinalType — ordered categories
│ └── XdQuantifiedType (abstract — units, magnitude status)
│ ├── XdCountType — non-negative integers + units
│ ├── XdQuantityType— decimals + required units
│ ├── XdFloatType — single-precision float
│ └── XdDoubleType — double-precision float
├── XdTemporalType — dates, times, partial dates
├── XdLinkType — references to other models/resources
└── XdFileType — binary data / file references
The key distinction: quantified types (XdCount, XdQuantity, XdFloat, XdDouble) require units. Non-quantified types (XdString, XdBoolean, XdTemporal) do not.
When to use each type
XdString — text data: names, addresses, emails, codes, descriptions. Constraints: min/max length, regex pattern, enumeration.
XdToken — machine-readable identifiers where whitespace should be collapsed and trimmed (ISO country codes, product SKUs, API tokens).
XdBoolean — binary states: yes/no, on/off, consent given/not given.
XdCount — non-negative integers representing countable quantities. Units are required even when obvious: "items," "people," "shares."
XdQuantity — decimal measurements with units. Blood pressure (mmHg), weight (kg), temperature (Celsius). This is where the blood pressure example from Module 1 becomes concrete — the unit and the value are structurally bound together.
XdTemporal — dates, times, and partial dates. Supports ISO 8601 patterns including year-only, year-month, and full datetime with timezone.
XdOrdinal — ordered categories: severity levels (low/medium/high), satisfaction scores (1-5), education levels.
XdLink — references to other models, resources, or external URIs. Used for relationships between entities.
XdFile — binary data or file references: images, documents, attachments.
Minimum knowledge modeling and type selection
The type you choose is itself a minimum knowledge modeling decision. If a field is "just text," use XdString — do not add quantification, ordering, or temporal semantics it does not need. If a field is a measurement, use XdQuantity — do not model it as XdString with "120 mmHg" as the value, because the unit constraint would be in the text, not in the structure.
The SDC4 type hierarchy is designed so that each type captures exactly what is essential about the data element and nothing more. The types are the minimum knowledge building blocks.
3.2 Creating Components with the Wizard
The Component Wizard walks you through creating a new component step by step.
- Navigate to your project's Components view
- Click Create Component
- The wizard guides you through:
- Type selection: Choose from the type hierarchy
- Label and description: Human-readable name and purpose
- Validation rules: Type-specific constraints (regex for XdString, units for XdQuantity, etc.)
- Semantic links: Ontology concept bindings (see §3.4 below)
- Review the summary and save
Editing existing components
Click on any component to open the stepped edit form:
- Step 1: Basic properties (label, description, type)
- Step 2: Validation rules and constraints
- Step 3: Semantic links (RDF predicate-object pairs)
- Step 4: Review and save
Every edit is versioned. The component's CUID2 identity remains stable across edits — only the version changes. This is the structural identity / descriptive label decoupling from Module 1 in practice: you can rename a component, change its description, even change its validation rules, and every system referencing it by CUID2 still finds the same entity.
3.3 Organizing with Clusters
A cluster is a logical grouping of related components within a data model. Clusters serve two purposes:
- Organization: Group components that belong together conceptually (e.g., "Patient Demographics," "Contact Information," "Clinical Measurements")
- Composition: The structure of clusters IS the composition in the minimum knowledge modeling sense — you compose minimum components into meaningful groups at use time, not by inflating individual components
Creating and managing clusters
- In your Data Model view, click Add Cluster
- Name the cluster (e.g., "Patient Demographics")
- Add a description of what the cluster represents
- Drag or assign components into the cluster
Cluster design principles
- One concept per cluster: A cluster should represent one coherent domain concept, not a grab-bag of related fields
- Composable across models: The same component can appear in multiple clusters across different models. The component is the reusable atom; the cluster is the molecule
- Flat where possible: Avoid deeply nested clusters unless the domain genuinely requires hierarchy. One level of nesting covers 90% of use cases.
- Named for the domain expert: Cluster names should be meaningful to the person who understands the data, not to the database administrator. "Blood Pressure Measurement" not "BP_CLUST_01"
3.4 Semantic Linking — Adding Meaning to Your Data
Semantic links connect your components to published ontology concepts. This is what makes the data self-describing across system boundaries.
A semantic link is an RDF predicate-object pair attached to a component:
- Predicate: The relationship type (e.g.,
rdf:type,rdfs:subClassOf,skos:exactMatch) - Object: The ontology concept being referenced (e.g.,
snomed:271649006for systolic blood pressure)
Adding semantic links in SDCStudio
- Open a component for editing
- Navigate to the Semantic Links step
- Click Add Link
- Select a predicate from the dropdown (common predicates are pre-loaded)
- Search for or paste the object URI (SDCStudio searches your uploaded ontologies and built-in standards)
- Save
Why semantic links matter
Without semantic links, a component called "blood_pressure" is just a label. With a semantic link to snomed:271649006, the component's meaning is machine-resolvable to the globally recognized SNOMED CT concept for systolic blood pressure. Any system that speaks SNOMED CT can verify what the component means without asking the author.
This is the W3C standards answer to Robert Vane's boundary problem: the meaning is not "inferred from relationships" — it is structurally bound to a globally resolvable ontology concept. The boundary becomes verification, not translation.
How many links does a component need?
Apply the minimum knowledge modeling principle: link to what is essential to identify the concept and distinguish it from its neighbors. A blood pressure measurement needs a SNOMED CT link. It does not need a link to every possible medical ontology that has a concept for blood pressure. One authoritative link is better than five redundant ones.
3.5 Publishing and Versioning
Publishing a model
Publishing locks the current version of a model and makes it available for output generation, component reuse, and deployment.
- In your Data Model view, click Publish
- Review the model summary (components, clusters, semantic links)
- Confirm publication
- Status changes to
PUBLISHED
Versioning
Every published model is immutable — it cannot be changed after publication. To evolve a model:
- Create a new version (SDCStudio provides a Create New Version action on published models)
- Edit the new version
- Publish the new version
Old versions remain accessible by their CUID2 + version identifier. This is why schema evolution works without data migration - old data validates against the version that was active when it was written, and new data validates against the current version. Both coexist.
What publishing generates: the context graph
When you publish a model, SDCStudio generates 8 output formats (XSD, XML, JSON, JSON-LD, HTML, RDF/Turtle, SHACL, GQL). The RDF, OWL, and SHACL outputs ARE the client's context graph. Every component you minted, every semantic link you defined, every vocabulary binding - they all become nodes and relationships in a live knowledge graph.
This is not a separate step. You did not "build a knowledge graph" as a project. You built a data model, and the graph was generated as a structural property of the modeling process. Every governance component (workflow models, attestation, party/role, provenance) is also present in the graph. Decision traces are queryable because they are structurally defined, not logged after the fact.
When clients or prospects ask about "context graphs" or "knowledge graphs" or "decision traces," this is what you deliver. The vocabulary is new. What SDC produces is not.
The governance components you publish (workflow state machines, attestation models, party/role definitions) are also consumable by runtime enforcement engines. When a client needs execution-time enforcement - where the system verifies that a state change is admissible before allowing it to happen - the models you built in SDCStudio can be handed off directly. Your modeling work becomes the contract that the execution engine enforces. See Module 7 for details on the execution handoff pattern.
3.6 Generating Applications
Once your model is published, SDCStudio can generate a complete deployable application — a Django web application with a REST API, admin interface, validation layer, and database schema, packaged as a Docker Compose deployment.
What the generated application includes
- Django models: Database tables matching your data model structure
- REST API: Endpoints for reading and writing data, with schema validation at the boundary
- Admin interface: Django admin for data management
- Validation layer: XSD 1.1 + Schematron constraints enforced at every write
- Docker Compose: Complete deployment with PostgreSQL, ready to run on the client's hardware
How to generate
- Navigate to your published model
- Click Generate Application
- Configure deployment options (lightweight vs enterprise stack)
- Click Generate
- Download the application bundle
The generated application runs on the client's own infrastructure — a Linux box, a private cloud VM, or anywhere Docker runs. The client's data never leaves their environment. There is no SaaS dependency for the deployed application.
This is what practitioners deploy for clients. The bespoke application is the deliverable. The per-touch stewardship model (Module 9) is how practitioners maintain and evolve it over time.
3.7 The Component Reuse Catalog
When you publish a model, the components you created are added to the SDCStudio component catalog. The next time you (or another practitioner) creates a model with similar data shapes, the catalog suggests matching existing components — which can be reused for free instead of minted as new (billable) components.
This is the mechanism behind the practitioner economics compounding: - First engagement: most components are new → real minting costs - Second engagement in the same vertical: many components match → free reuse - Fifth engagement: nearly everything matches → almost zero minting cost
The catalog is the resolution mechanism. When two systems use the same CUID2-identified component from the same catalog, meaning IS resolved to a single path. The constraint makes the shared identity enforceable at runtime.
Module 3 Exercise
Continuing with the "Atlas Legal Lab" project from Module 2:
- Open the model you created from
clients.csv - Create a new cluster called "Client Identity" and assign the name, email, and phone components to it
- Create a new cluster called "Case Information" and assign the case_number, matter_type, and status components
- Add a semantic link to the
client_namecomponent: predicaterdf:type, objectschema:Person(schema.org Person) - Edit the
phonecomponent — change its validation to include a regex pattern for US phone numbers - Publish the model
- Generate the XSD Schema and compare it to the one you generated in Module 2 — what changed?
- (Optional) Generate the JSON-LD output and open it in a text editor. Find the CUID2 identifiers and the semantic links you added.
This exercise takes approximately 30 minutes. The goal is to experience the difference between an AI-generated model (Module 2) and a practitioner-refined model (this module).
Further Reading
- Understanding Components and Data Types (SDCStudio Tutorial 4)
- Creating Components with the Wizard (SDCStudio Tutorial 5)
- Organizing with Clusters (SDCStudio Tutorial 7)
- Semantic Linking (SDCStudio Tutorial 8)
- Publishing and Generating Outputs (SDCStudio Tutorial 9)
- App Generation (SDCStudio User Guide)
- XD Types Reference (SDCStudio Reference)