Module 2: SDCStudio - Your First Data Model
Learning Objectives
- Create an SDCStudio account and fund your wallet.
- Navigate the SDCStudio interface (projects, data sources, models, components).
- Configure settings and upload domain ontologies.
- Upload a CSV file and observe the two-stage AI processing pipeline.
- Review an AI-generated data model and understand what was created.
- Publish a model and generate outputs in multiple formats.
2.1 Getting Started with SDCStudioCore
SDCStudio is a cloud-based platform - there is nothing to install. Open your browser and navigate to sdcstudio.axius-sdc.com.
Creating your account and funding your wallet
- Click Sign Up and create your account with an email address
- Navigate to Wallet and fund it with the $10 minimum
- Once your wallet is funded, the Practitioner Curriculum link appears in your user menu
The $10 is your own wallet funding - it stays yours and is used for component minting on your engagements. It is not a program fee.
2.2 Navigating the InterfaceCore
SDCStudio is a React single-page application. The main navigation provides access to:
- Dashboard - overview of your projects and recent activity
- Projects - create and manage projects (each project is a container for related models)
- Data Sources - view uploaded files and processing status
- Data Models - browse and edit your data models
- Components - manage reusable data components
- Settings - configure profile, ontologies, and preferences
Key Interface Features
- Real-time updates: The interface refreshes automatically as AI processes your data
- Status indicators: Color-coded badges show processing progress (
UPLOADING→PARSING→PARSED→AGENT_PROCESSING→COMPLETED) - Contextual actions: Buttons and menus appear based on what you are viewing
2.3 Configure Your Settings (Do This First)Core
Before creating models, configure your profile and upload any domain-specific ontologies.
Upload domain ontologies (optional but recommended)
Standard ontologies - FHIR, NIEM, SNOMED CT, LOINC, schema.org - are already built into SDCStudio. You only need to upload your organization's custom or local domain ontologies.
- Click Settings → Ontologies tab
- Prepare your custom ontology files in Turtle (.ttl) format
- Click Upload Ontology, select your file, add metadata (name, description, namespace URI)
- Save
Why this matters: The AI uses your uploaded ontologies to make better suggestions during processing. Better ontologies produce better models. This is the minimum knowledge modeling principle in practice - you provide the domain expertise, the AI applies the structural patterns.
2.4 Create Your First ProjectCore
- Navigate to Projects
- Click Create New Project
- Fill in: name (e.g., "Customer Analytics"), description, domain
- Click Create Project
A project is a container for related data models, components, and data sources. Think of it as a workspace for a specific engagement or use case.
2.5 Upload Data and Watch AI ProcessingCore
This is where SDCStudio demonstrates its core value. Upload a data file and watch the two-stage AI pipeline transform it into a structured, constraint-bound data model.
Upload
- Open your project
- Navigate to Data Sources tab
- Click Upload Data
- Choose your file - CSV is recommended for your first attempt (5-10 columns, clean headers)
- Click Upload
Stage 1: Structural Parsing (30 seconds to 2 minutes)
Status: UPLOADING → PARSING → PARSED
The platform detects file format, identifies columns/fields, and infers basic data types (XdString, XdCount, XdTemporal, etc.). Structure is mapped for the AI analysis stage.
Stage 2: AI Enhancement (1-5 minutes)
Status: AGENT_PROCESSING → COMPLETED
The AI performs:
- Semantic analysis: understands what each field represents
- Pattern recognition: identifies email patterns, phone formats, date formats, etc.
- Ontology matching: uses your uploaded ontologies (and built-in standards) for concept suggestions
- Validation rules: recommends appropriate constraints (regex patterns, ranges, enumerations)
- Relationship detection: finds logical groupings and connections between fields
The interface updates automatically as processing progresses.
2.6 Review Your Generated Data ModelCore
Once status shows COMPLETED:
- Navigate to Data Models tab in your project
- Click on your generated model (named after your uploaded file)
- Explore what the AI created:
- Data Model: the top-level structure
- Clusters: logical groupings of related fields
- Components: individual data elements with types, validation rules, and semantic links
The AI has created SDC4-compliant components, each with:
- An appropriate data type from the SDC4 type hierarchy
- Validation rules (pattern matching, range constraints, required fields)
- Semantic enrichment (descriptions and labels informed by your ontologies)
- Logical groupings in clusters
Critical: Review Every Component
The AI pipeline is probabilistic. It will make reasonable guesses, but it will not produce perfect components. You must review every component before publication. Start with the leaf elements - Units, Strings, Booleans, Links - and work upward through the dependency chain. Check data types, constraints, labels, and vocabulary bindings. Your domain expertise is what turns an AI draft into a production-grade data model. The AI's work is a starting point, not a final product.
Click on any component to modify its properties - data type, validation rules, labels, descriptions, required/optional status. Pay particular attention to:
- Units - did the AI assign the correct unit of measure? Is the unit label precise enough?
- Constraints - are range limits, pattern rules, and required/optional flags correct for your domain?
- Labels - remember from Module 1: once published, a component's label is permanently bound to its CUID2. Get it right now.
- Vocabulary bindings - are the semantic links to standard vocabularies (SNOMED, LOINC, etc.) correct and specific enough?
SDCStudio display names
SDCStudio uses human-friendly names in the interface. When this curriculum refers to the SDC4 type (e.g., XdString, XdQuantity, XdBoolean), SDCStudio displays it without the "Xd" prefix - as String, Quantity, Boolean, etc. The underlying type is the same.
Mini-exercise (2 min)
Before publishing, look at the components the AI generated from the Atlas Legal CSV. Which component would you review first and why?
Answer: The name or client_name component. The source data has names in inconsistent formats. The AI may have assigned it as a single String. You might need to split it into separate PersonName and BusinessName components, or at minimum add a constraint pattern. Review the most ambiguous fields first.
2.7 Publish and Generate OutputsIntermediate
Publish your components and model
Publication is irreversible
Once published, a component cannot be unpublished or edited. XSD validation runs automatically before publication succeeds. Review thoroughly before publishing. If you need to change something after publication, you create a new component - the old one remains valid forever (this is the "no migration" principle from Module 1).
SDCStudio enforces a bottom-up publication order. You cannot publish a component until all of its dependencies are published first. If you try, SDCStudio will block publication and tell you which child components are missing. Follow this sequence:
| Order | What to Publish | Why This Order |
|---|---|---|
| 1. Leaf components | Units, Booleans, Strings, Tokens, Links, Files, Intervals, String Lists, Boolean Lists | No dependencies - these are the foundation |
| 2a. Reference Ranges | ReferenceRange, SimpleReferenceRange | Depend on XdInterval (Level 1) |
| 2b. Quantified types | Ordinals, Counts, Quantitys, Floats, Doubles, Temporals, Decimal Lists, Integer Lists | Depend on Units and may reference ReferenceRanges |
| 3a. Party detail Clusters | Small leaf Clusters containing only Level 1/2 components | Party needs its detail Cluster published first |
| 3b. Party | Party components | Depends on detail Clusters (3a) |
| 3c. Provenance components | Participations, Audits, Attestations | All depend on Party (3b) |
| 4. Domain Clusters | Your domain data Clusters (may need multiple passes for nested Clusters) | Depend on all prior levels |
| 5. Data Model (DM) | The root DM | Everything else must be published first |
If a Cluster contains nested Clusters, publish the innermost (leaf) Clusters first, then work outward. You may need to run through the Cluster publication step more than once.
Once the DM is published, your model is locked and available for output generation. You can always create a new version of any component later - the published version remains valid forever.
Mini-exercise (2 min)
You just published a component with the label "Client Name." You realize it should be "Person Name" because the component will be reused for non-client contacts. Can you rename it?
Answer: No. Publication is irreversible. The label is permanently bound to the CUID2. You create a new component called "Person Name" with its own CUID2. The "Client Name" component remains valid forever. This is the "no migration" principle in action.
Generate outputs
Once published, you can generate outputs in any of 8 formats:
| Format | What it provides |
|---|---|
| XSD Schema | XML Schema Definition for structural validation |
| XML Instance | Example XML document conforming to the schema |
| JSON Schema | JSON Schema Definition |
| JSON-LD | Linked data representation for semantic web integration |
| HTML Documentation | Human-readable documentation of the model |
| RDF Triples | Semantic web graph data |
| SHACL | RDF constraint shapes for graph-level validation |
| GQL | Graph database query statements |
To generate: click the Generate dropdown → select output type → configure options → click Generate → download.
This is the moment where the SDC4 specification becomes concrete. One model, authored once, produces 8 interoperable output formats. The data carries its own constraints, identity, and semantic context in every format.
2.8 What You Just DidCore
In under an hour, you:
- Created an SDCStudio account and funded your wallet
- Uploaded a data file (CSV, Markdown, or JSON)
- Watched the AI build a constraint-bound, semantically enriched data model
- Reviewed the generated components, clusters, and validation rules
- Published the model
- Generated outputs in multiple interoperable formats
Every output you generated carries structural constraints (XSD 1.1), semantic identity (CUID2 identifiers), and vocabulary bindings - the same properties that make SDC data self-describing across system boundaries. This is the foundation for everything you will learn in the remaining modules.
Module 2 Exercise
Using the sample data from lab/sample_csv/clients.csv (the Atlas Legal case study data):
Continuing with Atlas Legal: You are now building the data model that Atlas Legal's client records need. Dana Okafor gave you a CSV export from their Clio system. Your job is to turn it into a structured, validated SDC model.
- Create a new project in SDCStudio with a unique name that identifies you and the case study (e.g., your initials + "Atlas Legal"). SDCStudio requires unique project names. Practice choosing descriptive, distinguishable names now - you will need this discipline with every client.
- Upload
clients.csv - Observe the two-stage processing pipeline
- Review the generated model - how many components were created? What types were assigned?
- Compare the AI's type assignments to what you would have chosen manually. Where does the AI get it right? Where would you override?
- Publish the model and generate the XSD Schema output
- Open the XSD and identify the constraint rules the AI embedded
This exercise takes approximately 20 minutes. No quiz - the hands-on experience is the learning.