Need help? Join our Discord Glossary | Exam Info

Module 2: SDCStudio - Your First Data Model

Duration: ~60 minutes self-paced (includes hands-on lab time) Prerequisites: Module 1

Learning Objectives

  • Create an SDCStudio account and fund your wallet.
  • Navigate the SDCStudio interface (projects, data sources, models, components).
  • Configure settings and upload domain ontologies.
  • Upload a CSV file and observe the two-stage AI processing pipeline.
  • Review an AI-generated data model and understand what was created.
  • Publish a model and generate outputs in multiple formats.

2.1 Getting Started with SDCStudioCore

SDCStudio is a cloud-based platform - there is nothing to install. Open your browser and navigate to sdcstudio.axius-sdc.com.

Creating your account and funding your wallet

  1. Click Sign Up and create your account with an email address
  2. Navigate to Wallet and fund it with the $10 minimum
  3. Once your wallet is funded, the Practitioner Curriculum link appears in your user menu

The $10 is your own wallet funding - it stays yours and is used for component minting on your engagements. It is not a program fee.


2.2 Navigating the InterfaceCore

SDCStudio is a React single-page application. The main navigation provides access to:

  • Dashboard - overview of your projects and recent activity
  • Projects - create and manage projects (each project is a container for related models)
  • Data Sources - view uploaded files and processing status
  • Data Models - browse and edit your data models
  • Components - manage reusable data components
  • Settings - configure profile, ontologies, and preferences

Key Interface Features

  • Real-time updates: The interface refreshes automatically as AI processes your data
  • Status indicators: Color-coded badges show processing progress (UPLOADINGPARSINGPARSEDAGENT_PROCESSINGCOMPLETED)
  • Contextual actions: Buttons and menus appear based on what you are viewing

2.3 Configure Your Settings (Do This First)Core

Before creating models, configure your profile and upload any domain-specific ontologies.

Upload domain ontologies (optional but recommended)

Standard ontologies - FHIR, NIEM, SNOMED CT, LOINC, schema.org - are already built into SDCStudio. You only need to upload your organization's custom or local domain ontologies.

  1. Click SettingsOntologies tab
  2. Prepare your custom ontology files in Turtle (.ttl) format
  3. Click Upload Ontology, select your file, add metadata (name, description, namespace URI)
  4. Save

Why this matters: The AI uses your uploaded ontologies to make better suggestions during processing. Better ontologies produce better models. This is the minimum knowledge modeling principle in practice - you provide the domain expertise, the AI applies the structural patterns.


2.4 Create Your First ProjectCore

  1. Navigate to Projects
  2. Click Create New Project
  3. Fill in: name (e.g., "Customer Analytics"), description, domain
  4. Click Create Project

A project is a container for related data models, components, and data sources. Think of it as a workspace for a specific engagement or use case.


2.5 Upload Data and Watch AI ProcessingCore

This is where SDCStudio demonstrates its core value. Upload a data file and watch the two-stage AI pipeline transform it into a structured, constraint-bound data model.

Upload

  1. Open your project
  2. Navigate to Data Sources tab
  3. Click Upload Data
  4. Choose your file - CSV is recommended for your first attempt (5-10 columns, clean headers)
  5. Click Upload

Stage 1: Structural Parsing (30 seconds to 2 minutes)

Status: UPLOADINGPARSINGPARSED

The platform detects file format, identifies columns/fields, and infers basic data types (XdString, XdCount, XdTemporal, etc.). Structure is mapped for the AI analysis stage.

Stage 2: AI Enhancement (1-5 minutes)

Status: AGENT_PROCESSINGCOMPLETED

The AI performs:

  • Semantic analysis: understands what each field represents
  • Pattern recognition: identifies email patterns, phone formats, date formats, etc.
  • Ontology matching: uses your uploaded ontologies (and built-in standards) for concept suggestions
  • Validation rules: recommends appropriate constraints (regex patterns, ranges, enumerations)
  • Relationship detection: finds logical groupings and connections between fields

The interface updates automatically as processing progresses.


2.6 Review Your Generated Data ModelCore

Once status shows COMPLETED:

  1. Navigate to Data Models tab in your project
  2. Click on your generated model (named after your uploaded file)
  3. Explore what the AI created:
    • Data Model: the top-level structure
    • Clusters: logical groupings of related fields
    • Components: individual data elements with types, validation rules, and semantic links

The AI has created SDC4-compliant components, each with:

  • An appropriate data type from the SDC4 type hierarchy
  • Validation rules (pattern matching, range constraints, required fields)
  • Semantic enrichment (descriptions and labels informed by your ontologies)
  • Logical groupings in clusters

Critical: Review Every Component

The AI pipeline is probabilistic. It will make reasonable guesses, but it will not produce perfect components. You must review every component before publication. Start with the leaf elements - Units, Strings, Booleans, Links - and work upward through the dependency chain. Check data types, constraints, labels, and vocabulary bindings. Your domain expertise is what turns an AI draft into a production-grade data model. The AI's work is a starting point, not a final product.

Click on any component to modify its properties - data type, validation rules, labels, descriptions, required/optional status. Pay particular attention to:

  • Units - did the AI assign the correct unit of measure? Is the unit label precise enough?
  • Constraints - are range limits, pattern rules, and required/optional flags correct for your domain?
  • Labels - remember from Module 1: once published, a component's label is permanently bound to its CUID2. Get it right now.
  • Vocabulary bindings - are the semantic links to standard vocabularies (SNOMED, LOINC, etc.) correct and specific enough?

SDCStudio display names

SDCStudio uses human-friendly names in the interface. When this curriculum refers to the SDC4 type (e.g., XdString, XdQuantity, XdBoolean), SDCStudio displays it without the "Xd" prefix - as String, Quantity, Boolean, etc. The underlying type is the same.

Mini-exercise (2 min)

Before publishing, look at the components the AI generated from the Atlas Legal CSV. Which component would you review first and why?

Answer: The name or client_name component. The source data has names in inconsistent formats. The AI may have assigned it as a single String. You might need to split it into separate PersonName and BusinessName components, or at minimum add a constraint pattern. Review the most ambiguous fields first.


2.7 Publish and Generate OutputsIntermediate

Publish your components and model

Publication is irreversible

Once published, a component cannot be unpublished or edited. XSD validation runs automatically before publication succeeds. Review thoroughly before publishing. If you need to change something after publication, you create a new component - the old one remains valid forever (this is the "no migration" principle from Module 1).

SDCStudio enforces a bottom-up publication order. You cannot publish a component until all of its dependencies are published first. If you try, SDCStudio will block publication and tell you which child components are missing. Follow this sequence:

Order What to Publish Why This Order
1. Leaf components Units, Booleans, Strings, Tokens, Links, Files, Intervals, String Lists, Boolean Lists No dependencies - these are the foundation
2a. Reference Ranges ReferenceRange, SimpleReferenceRange Depend on XdInterval (Level 1)
2b. Quantified types Ordinals, Counts, Quantitys, Floats, Doubles, Temporals, Decimal Lists, Integer Lists Depend on Units and may reference ReferenceRanges
3a. Party detail Clusters Small leaf Clusters containing only Level 1/2 components Party needs its detail Cluster published first
3b. Party Party components Depends on detail Clusters (3a)
3c. Provenance components Participations, Audits, Attestations All depend on Party (3b)
4. Domain Clusters Your domain data Clusters (may need multiple passes for nested Clusters) Depend on all prior levels
5. Data Model (DM) The root DM Everything else must be published first

If a Cluster contains nested Clusters, publish the innermost (leaf) Clusters first, then work outward. You may need to run through the Cluster publication step more than once.

Once the DM is published, your model is locked and available for output generation. You can always create a new version of any component later - the published version remains valid forever.

Mini-exercise (2 min)

You just published a component with the label "Client Name." You realize it should be "Person Name" because the component will be reused for non-client contacts. Can you rename it?

Answer: No. Publication is irreversible. The label is permanently bound to the CUID2. You create a new component called "Person Name" with its own CUID2. The "Client Name" component remains valid forever. This is the "no migration" principle in action.

Generate outputs

Once published, you can generate outputs in any of 8 formats:

Format What it provides
XSD Schema XML Schema Definition for structural validation
XML Instance Example XML document conforming to the schema
JSON Schema JSON Schema Definition
JSON-LD Linked data representation for semantic web integration
HTML Documentation Human-readable documentation of the model
RDF Triples Semantic web graph data
SHACL RDF constraint shapes for graph-level validation
GQL Graph database query statements

To generate: click the Generate dropdown → select output type → configure options → click Generate → download.

This is the moment where the SDC4 specification becomes concrete. One model, authored once, produces 8 interoperable output formats. The data carries its own constraints, identity, and semantic context in every format.


2.8 What You Just DidCore

In under an hour, you:

  1. Created an SDCStudio account and funded your wallet
  2. Uploaded a data file (CSV, Markdown, or JSON)
  3. Watched the AI build a constraint-bound, semantically enriched data model
  4. Reviewed the generated components, clusters, and validation rules
  5. Published the model
  6. Generated outputs in multiple interoperable formats

Every output you generated carries structural constraints (XSD 1.1), semantic identity (CUID2 identifiers), and vocabulary bindings - the same properties that make SDC data self-describing across system boundaries. This is the foundation for everything you will learn in the remaining modules.


Module 2 Exercise

Using the sample data from lab/sample_csv/clients.csv (the Atlas Legal case study data):

Continuing with Atlas Legal: You are now building the data model that Atlas Legal's client records need. Dana Okafor gave you a CSV export from their Clio system. Your job is to turn it into a structured, validated SDC model.

  1. Create a new project in SDCStudio with a unique name that identifies you and the case study (e.g., your initials + "Atlas Legal"). SDCStudio requires unique project names. Practice choosing descriptive, distinguishable names now - you will need this discipline with every client.
  2. Upload clients.csv
  3. Observe the two-stage processing pipeline
  4. Review the generated model - how many components were created? What types were assigned?
  5. Compare the AI's type assignments to what you would have chosen manually. Where does the AI get it right? Where would you override?
  6. Publish the model and generate the XSD Schema output
  7. Open the XSD and identify the constraint rules the AI embedded

This exercise takes approximately 20 minutes. No quiz - the hands-on experience is the learning.


Further Reading