Data Modeling Guide
Overview
Data modeling in SDCStudio involves creating, customizing, and managing SDC4-compliant data structures. Whether you're refining an AI-generated model or creating one from scratch, this guide covers everything you need to know.
Understanding Data Models
Model Structure
Data Model
└── Data Cluster
├── Component 1 (XdString)
├── Component 2 (XdCount)
├── Component 3 (XdTemporal)
├── Component 4 (XdString)
└── Component 5 (XdBoolean)
Sovereign Identity & Source Provenance
Every Data Model is automatically assigned a mandatory CUID2 instance_id — a sovereign identifier that is decoupled from any storage or transport mechanism. This ID is auto-generated and read-only.
When ingesting data from an external source system (e.g., Epic, SAP), you can optionally record:
- Source Instance ID: The identifier of the data instance in the originating system
- Source Version ID: The version identifier from the originating system
These fields enable auditable data lineage across system boundaries without conflating the SDC identity with the legacy source identity.
Data Model: Top-level container for your entire data structure Data Cluster: Logical grouping of related components Components: Individual data elements (fields/columns)
Component Types
SDCStudio supports all SDC4 data types:
Primitive Types: - XdString: Text data with validation - XdCount: Integer values - XdQuantity: Decimal numbers with units - XdBoolean: True/false values - XdTemporal: Dates, times, durations - XdFloat/XdDouble: Floating-point numbers
Complex Types: - Clusters: Grouped components - Lists: Ordered collections - References: Links to other components - Adapters: Special wrappers
See Data Types Reference for complete details.
Working with AI-Generated Models
Reviewing Generated Models
After uploading a file, SDCStudio automatically creates:
- Data Model: Named after your file (e.g.,
customers) - Data Cluster: Groups all columns (e.g.,
customers_cluster) - Components: One per column with appropriate types
Example (from CSV with columns: id, name, email, signup_date):
Data Model: customers
└── Cluster: customers_cluster
├── XdCount: customer_id
├── XdString: name
├── XdString: email
└── XdTemporal: signup_date
Navigating Your Model
- Navigate to Models: Go to
/app/modelsor click "Browse Data Models" on the dashboard - Find Your Model in the list (you can filter by project)
- Click the Model Name to view details
- See:
- Model metadata (name, description, status)
- Data cluster structure
- All components
- Validation status
Quick Review Checklist
✅ Verify Component Types: Are all types correct? ✅ Check Labels: Are names user-friendly? ✅ Review Validation: Are constraints appropriate? ✅ Test Required Fields: Should any be optional? ✅ Check Documentation: Are descriptions clear?
Editing Components
Access Component Editor
- Navigate to Components: Go to
/app/components - Find the Component you want to edit
- Use the search box to filter by name
- Use dropdowns to filter by type, project, or status
- Click the Component to view its detail page
- Click the "Edit" button to open the stepped edit form (only visible if component is unpublished)
Stepped Edit Form
The component editor uses a wizard-style stepped layout with 5 steps (or 4 for structural types like Clusters):
- Basic Info: Label, description, project, language, published/public status
- Type Config: Type-specific fields (regex for XdString, range for XdCount, etc.)
- Advanced Settings: Sequence number, UI type, validation flags (ACT, VTB, VTE, TR, MOD, Location)
- Semantic Linking: Select RDF predicate objects to link semantic meaning
- Review & Save: Summary of all fields with Save and Save & Publish buttons
Navigation: In edit mode, all steps are accessible - click any step in the stepper to jump directly to it. You don't need to follow the steps in order.
Component Properties
Every component has these core properties:
Basic Information
Label: - Human-readable name - Displayed in forms and documentation - Example: "Customer Email Address"
Technical Name:
- Machine-readable identifier
- Used in code generation
- Example: customer_email
Description: - Detailed explanation of purpose - Usage guidelines - Business context
Data Type Configuration
XdString (Text):
Min Length: 1
Max Length: 255
Pattern: ^[A-Za-z\s]+$ (letters and spaces only)
Default Value: (optional)
XdCount (Integer):
Min Value: 0
Max Value: 999999
Units: count
Precision: 0 (no decimals)
Default Value: 0
XdQuantity (Decimal):
Min Value: 0.0
Max Value: 999999.99
Units: kg, meters, etc.
Precision: 2 (two decimal places)
Default Value: 0.0
XdTemporal (Date/Time):
Allowed Types: date, time, datetime, duration
Format: ISO 8601 (YYYY-MM-DD)
Min Date: 1900-01-01
Max Date: 2100-12-31
Default Value: (optional)
XdBoolean (True/False):
Default Value: true or false
Display As: checkbox, radio, toggle
Validation Rules
Required vs Optional: - Required: Must have a value - Optional: Can be null/empty
Pattern Validation (for XdString):
- Regular expression patterns (must follow XML Schema regex rules)
- Use the XML Regex Reference & Sandbox to browse common patterns and test your own
- Common patterns:
- Email: ^[^@]+@[^@]+\.[^@]+$
- Phone: ^\+?[1-9]\d{1,14}$
- Zip: ^\d{5}(-\d{4})?$
- URL: ^https?://.*
AI-Assisted Regex Generation (XdString only):
The Format Pattern field includes a wand button that asks the LLM to suggest an XML Schema regex. The suggestion is based on the component's label, description, and any format: examples you include in the description.
To get the best results:
- Use a descriptive label (e.g., "US Phone Number" rather than "phone")
- Add format: examples in the description field — e.g., format: (555) 123-4567, 555-123-4567
- The wand button is only active when the str_fmt field is empty and no enumerations are present
The generated pattern follows XML Schema regex rules (no anchors, no lookahead, max 60 characters). You can always edit the suggestion afterward. Use the XML Regex Reference & Sandbox to test and validate patterns before saving.
Note: Google Cloud Vertex AI enforces safety filters that may truncate suggestions for PII-related fields (SSN, credit card, passport, etc.). SDCStudio sanitizes known PII terms automatically, but if you get a truncated regex, use a generic label or write the pattern manually. See Vertex AI Safety Filters for details.
Enumeration (allowed values):
Allowed Values: ['active', 'inactive', 'pending']
Default: 'active'
Range Constraints: - Minimum and maximum values - For numeric and temporal types
Example: Editing an Email Component
Before (AI-generated):
Label: email
Type: XdString
Max Length: 255
Pattern: (none)
Required: false
After (customized):
Label: Customer Email Address
Type: XdString
Min Length: 5
Max Length: 320 (RFC 5321 standard)
Pattern: ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
Required: true
Description: Primary contact email for customer communications
Saving Changes
- Navigate to the Review & Save step (Step 5) in the edit form
- Review Your Edits: The Review step shows a summary of all field values
- Choose Save Option:
- "Save": Saves as draft - can still be edited later
- "Save & Publish": Saves AND publishes - CANNOT be edited after this
- Verify: You'll be redirected to the component detail page
- Test: Validation rules work as expected
⚠️ WARNING: Once you click "Save & Publish", the component becomes permanent and cannot be edited. Only use "Save & Publish" when the component is completely finalized.
Creating Components Manually
Add a New Component
- Navigate to Components: Go to
/app/components - Click "Create New" button
- Component Wizard guides you through creation:
- Step 1 - Choose Type: Select the SDC4 type (XdString for text, XdCount for integers, etc.)
- Step 2 - Basic Info: Enter label, description, select project, choose language
- Step 3 - Type Config: Configure type-specific properties (regex/length for XdString, range for XdCount, format for XdTemporal, etc.)
- Step 4 - Advanced Settings: Set sequence number, UI type, validation flags
- Step 5 - Semantic Linking: Link to RDF predicate objects for semantic meaning
- Step 6 - Review: Review all settings and click "Create" to save
- You'll be redirected to the new component's detail page
Component Naming Best Practices
Technical Names:
- Use lowercase with underscores: customer_email
- Be descriptive: order_total_amount not total
- Avoid abbreviations: quantity not qty
- Be consistent: first_name, last_name (not firstName, surname)
Labels: - Use proper capitalization: "Customer Email" - Be user-friendly: "Email Address" not "email_addr" - Include context: "Order Total Amount" not just "Total"
Working with Clusters
Creating Clusters
Clusters group related components logically.
Example: Customer Information
Customer Cluster
├── customer_id (XdCount)
├── first_name (XdString)
├── last_name (XdString)
├── date_of_birth (XdTemporal)
├── email (XdString)
├── phone (XdString)
├── address (XdString)
├── signup_date (XdTemporal)
├── status (XdString)
└── total_purchases (XdCount)
Create a New Cluster
- Navigate to Components: Go to
/app/components - Click "Create New", then select "Cluster" from the type list
- Cluster Wizard guides you through a 4-step flow (no Advanced Settings step for Clusters):
- Step 1 - Basic Info: Label, description, project selection
- Step 2 - Component Selection: Use the TransferList to select which components belong in this cluster. Available components appear on the left; move them to the right to include them.
- Step 3 - Semantic Linking: Link to RDF predicate objects
- Step 4 - Review: Review and click "Create"
Add Components to Cluster
You can add components to a cluster when creating the cluster (via the TransferList in Step 2), or by editing the cluster later:
- Edit the Cluster: Navigate to the cluster's detail page and click "Edit"
- Go to the Component Selection step: Use the TransferList to add/remove components
- Navigate to Review & Save: Click "Save" to apply changes
Validation and Quality Checks
Built-in Validation
SDCStudio validates your model automatically:
Type Validation: - Ensures components use valid SDC4 types - Checks for proper type configurations
Name Validation: - Unique technical names - Valid identifiers (no special characters) - No reserved keywords
Constraint Validation: - Min < Max for ranges - Valid regex patterns - Consistent enumeration values
Structure Validation: - No circular references - Valid cluster membership - Proper component relationships
View Validation Issues
- Navigate to Models: Go to
/app/models - Find Your Model in the list
- Click the Model to view its detail page
- Check Validation Status:
- ✅ Valid: Ready to publish
- ⚠️ Warnings: Issues to review
- ❌ Errors: Must fix before publishing
Fix Common Issues
"Duplicate technical name": - Rename one of the components - Ensure all names are unique within the model
"Invalid pattern": - Check regex syntax - Test pattern with the XML Regex Reference & Sandbox - Use simpler pattern if complex
"Min greater than Max": - Adjust range constraints - Ensure Min < Max
Publishing Models
⚠️ CRITICAL: Publication is a ONE-TIME, Irreversible Operation
Once a component or model is published, it CANNOT be: - Edited - Unpublished - Modified - Deleted (in normal user operations)
This is a core SDC principle ensuring data model integrity and reproducibility.
When to Publish
Publish your model ONLY when: - ✅ All components are completely finalized - ✅ No errors in validation check - ✅ All required components added and tested - ✅ Validation rules thoroughly tested - ✅ Documentation complete and reviewed - ✅ Stakeholders have approved the model - ✅ You are CERTAIN this is the final version
Consider creating a test model first to validate your approach before publishing the production version.
Publish Process
IMPORTANT: You must publish child components BEFORE publishing the parent model.
- Publish All Child Components First:
- Navigate to each component used in your model
- Review and verify each component is correct
- Click "Save & Publish" for each component
-
Ensure ALL child components show "Published" status before proceeding
-
Publish the Data Model:
- Navigate to Models: Go to
/app/models - Find Your Model in the list
- Click the Model to view details
- Final Review: Verify ALL components are published and settings are correct
-
Click the "Publish" button on the model detail page
- This can only be done ONCE
- The button will disappear after publishing
-
Generate Package: After publishing the model, click "Generate Package (One-Time Only)"
- This generates ALL export files (XSD, XML, JSON, HTML, etc.)
- This can also only be done ONCE
Status Changes: DRAFT → PUBLISHED (permanent)
⚠️ Critical Workflow: Child components → Model → Package generation. You cannot publish a model if its child components are not yet published.
What Publishing Enables
Once published and generated, you can: - ✅ Download XSD schemas - ✅ Download XML instances - ✅ Download JSON schemas and instances - ✅ Download HTML documentation - ✅ Download complete ZIP package - ✅ Share generated files with team members
What You CANNOT Do After Publishing
After publishing, you CANNOT: - ❌ Edit the model or any of its components - ❌ Unpublish the model - ❌ Delete the model (through normal UI) - ❌ Regenerate the export package - ❌ Modify any component definitions
If You Need to Make Changes
If you discover issues after publishing:
- Create a NEW model with a new version number or name
- Copy component definitions from the published model as a reference
- Make your changes in the new model
- Test thoroughly before publishing the new version
- Publish the new model when ready
Version Control Strategy: - Use version numbers in model names (e.g., "CustomerModel_v1", "CustomerModel_v2") - Document what changed between versions - Keep old versions as historical record
Best Practices
Design Principles
Keep It Simple: - Start with core components - Add complexity gradually - Don't over-engineer initial models
Be Consistent: - Use consistent naming conventions - Apply similar validation rules to similar fields - Maintain logical grouping in clusters
Document Everything: - Add clear descriptions to all components - Explain validation rules - Document business logic
Think Long-Term: - Design for reusability - Consider future extensions - Plan for data evolution
Component Organization
Logical Grouping: - Group related components in a cluster - Keep clusters flat — all components are direct children - Use meaningful cluster names
Example: E-commerce Order
Order Model
└── Order Data Cluster
├── order_id
├── order_date
├── order_status
├── customer_id
├── customer_email
├── shipping_address
├── shipping_method
├── tracking_number
├── payment_method
├── total_amount
└── payment_status
Validation Strategy
Balance Strictness: - Too strict: Users can't enter valid data - Too loose: Invalid data gets through - Find the right balance for your use case
Common Sense Validation: - Email: Pattern match + max length - Phone: Pattern match + min/max length - Dates: Reasonable range (not year 1000) - Quantities: Non-negative (unless negatives valid)
Test Your Validation: - Try valid edge cases - Try invalid inputs - Verify error messages make sense
Advanced Features
Semantic Definitions
Link your components to ontologies for better AI understanding:
- Edit Component and navigate to the Semantic Linking step
- Add Semantic Link:
- Use the TransferList to select existing predicate-object pairs
- Or create a new RDF predicate-object pair using the "Create New" button
- Use the Google search button (magnifying glass icon) to search for controlled vocabulary terms - this opens a Google search pre-configured with ontology-focused sources (OBO Foundry, BioPortal, LOV, etc.)
- Save: Component now has semantic meaning
See Semantic Enhancement Guide for details.
Reference Components
Reuse existing components:
- Create Reference Component
- Select Target Component
- Configure Reference Type:
- Direct reference
- Copy with modifications
- Save: Reference is created
Component Templates
Save frequently used components as templates:
- Edit Component
- Click "Save as Template"
- Name Template: e.g., "Email Field Template"
- Use Template: Select when creating new components
Troubleshooting
Can't Save Component
Validation Errors: - Check all required fields filled - Verify regex pattern syntax - Ensure min < max for ranges - Check for duplicate names
Permission Issues: - Verify you have edit permissions - Check model isn't locked - Ensure model isn't published (unpublish first)
Component Not Appearing
Cache Issues: - Refresh the page - Clear browser cache - Try different browser
Cluster Issues: - Check the component belongs to the correct cluster - Verify component saved successfully - Look in "All Components" view
Validation Fails
Review Error Messages: - Read validation errors carefully - Fix one error at a time - Re-validate after each fix
Common Fixes: - Fix regex pattern syntax - Adjust range constraints - Rename duplicate components - Complete required fields
Next Steps
- Semantic Enhancement - Add ontologies for better AI
- Generating Outputs - Create schemas and apps
- AI Processing Guide - Understand AI analysis
- Data Types Reference - Complete type documentation
Getting Help
- Troubleshooting Guide - Common issues
- Model Components Reference - Complete component reference
- Support: support@axius-sdc.com
Ready to build? Start creating and customizing your data models today!