SDCStudio DMGEN Model Definitions
This document provides a comprehensive overview of all the data model components (models) available in the SDCStudio DMGEN system, their constraints, and intended uses.
Table of Contents
- Overview
- Core Data Types
- Quantified Data Types
- List Data Types
- Structural Components
- Supporting Components
- Reference Components
- Metadata and Organization
Overview
The SDCStudio DMGEN system implements the SDC4 (Semantic Data Charter) Reference Model, providing a comprehensive framework for creating semantic data models. All models use CUID2 (Collision-resistant Unique IDentifier) as their primary key and follow a consistent pattern of inheritance from base classes.
Model Hierarchy
Common (Abstract Base)
├── XdAny (Abstract Base for all data types)
│ ├── XdString (Text data)
│ ├── XdToken (Normalized text)
│ ├── XdBoolean (True/false values)
│ ├── XdLink (URI references)
│ ├── XdFile (File attachments)
│ ├── XdOrdered (Abstract for ordered values)
│ │ ├── XdOrdinal (Rankings and scores)
│ │ ├── XdTemporal (Date/time values)
│ │ └── XdQuantified (Abstract for numeric values)
│ │ ├── XdCount (Integer values)
│ │ ├── XdQuantity (Decimal with units)
│ │ ├── XdFloat (32-bit floating point)
│ │ └── XdDouble (64-bit floating point)
│ └── List Types (Collections of values)
├── Structural Components
│ ├── Cluster (Grouping structure)
│ ├── Party (Actor/Role representation)
│ ├── Participation (Activity participation)
│ ├── Audit (Audit trail)
│ └── Attestation (Data verification)
└── Supporting Components
├── Units (Unit definitions)
├── ReferenceRange (Value ranges)
├── SimpleReferenceRange (Simple ranges)
└── XdInterval (Interval definitions)
Core Data Types
XdString
Purpose: Text data with various constraints and validation rules.
Intended Uses: - Names, descriptions, and free text - Codes and identifiers - Email addresses, URLs, and formatted text - Enumerated values with descriptions
Constraints:
- min_length: Minimum character count
- max_length: Maximum character count
- exact_length: Fixed character count (for codes/identifiers)
- enums: List of allowed values (one per line)
- enum_descr: Descriptions for enumerated values
- definitions: URIs defining each enumeration
- def_val: Default value (up to 255 characters)
- str_fmt: Regular expression pattern for format validation
Constraint Priority: Enumeration > Exact Length > Min/Max Lengths > Default value
Examples: - Observation notes (min_length: 10, max_length: 500) - Diagnosis codes (enums: ICD-10 codes, exact_length: 7) - Medication names (min_length: 1, max_length: 100) - Email addresses (str_fmt: email regex pattern)
Note: Person names and identity should use Party components in the DM's subject section, not XdString in the data Cluster.
XdToken
Purpose: Normalized text where whitespace is standardized.
Intended Uses: - Codes and identifiers that may have inconsistent spacing - Normalized text fields - Searchable text content
Constraints: Same as XdString, plus:
- language: Language specification for the token
Examples: - Product codes with inconsistent spacing - Normalized names and titles - Searchable keywords
XdBoolean
Purpose: True/false or yes/no decisions.
Intended Uses: - Binary flags and indicators - Presence/absence indicators - Simple decision points
Constraints:
- trues: List of values representing TRUE (one per line)
- falses: List of values representing FALSE (one per line)
Important: Should not be used for enumerated types with more than two values (e.g., male/female should use XdString with enumerations).
Examples: - "Is patient pregnant?" (trues: Yes, true, 1; falses: No, false, 0) - "Has insurance?" (trues: Yes; falses: No) - "Is active?" (trues: true, 1; falses: false, 0)
XdLink
Purpose: References to external resources and URIs.
Intended Uses: - Links to other data models - External resource references - Semantic relationships
Constraints:
- link: The URI that points to the linked item
- relation: Description of the relationship
- relation_uri: URI defining the relationship type
Examples: - Links to related clinical guidelines - References to external ontologies - Links to supporting documentation
XdFile
Purpose: File attachments with MIME type specifications.
Intended Uses: - Image attachments (photos, scans) - Document attachments (PDFs, reports) - Audio/video files - Any binary content with defined MIME types
Constraints:
- mime_type: MIME type specification
- file_extension: Allowed file extensions
- max_size: Maximum file size in bytes
Examples: - Patient photos (mime_type: image/jpeg) - Medical reports (mime_type: application/pdf) - Audio recordings (mime_type: audio/wav)
Quantified Data Types
XdCount
Purpose: Integer values with constraints and units.
Intended Uses: - Countable quantities (pregnancies, steps, cigarettes) - Identifiers and sequence numbers - Discrete measurements
Constraints:
- min_magnitude: Minimum allowed value
- max_magnitude: Maximum allowed value
- total_digits: Maximum number of digits
- units: Required units specification (ForeignKey to Units)
- min_inclusive/max_inclusive: Inclusive range bounds
- min_exclusive/max_exclusive: Exclusive range bounds
Important: Not for physical quantities with standardized units (use XdQuantity instead).
Examples: - Age (min_magnitude: 0, max_magnitude: 150, units: years) - Number of children (min_magnitude: 0, max_magnitude: 20, units: count) - Pregnancy count (min_magnitude: 0, max_magnitude: 20, units: pregnancies)
XdQuantity
Purpose: Decimal values with units and precision control.
Intended Uses: - Physical measurements (weight, height, temperature) - Scientific quantities - Time durations - Any decimal value requiring units
Constraints:
- min_magnitude/max_magnitude: Value range
- total_digits: Maximum total digits
- fraction_digits: Maximum decimal places
- units: Required units specification (ForeignKey to Units)
- min_inclusive/max_inclusive: Inclusive range bounds
- min_exclusive/max_exclusive: Exclusive range bounds
Examples: - Weight (units: kg, min_magnitude: 0, max_magnitude: 500, fraction_digits: 2) - Temperature (units: Celsius, min_magnitude: -50, max_magnitude: 60, fraction_digits: 1) - Blood pressure (units: mmHg, min_magnitude: 0, max_magnitude: 300, fraction_digits: 0)
XdFloat
Purpose: 32-bit floating point numbers with units.
Intended Uses: - Scientific calculations requiring single precision - Performance-critical numeric data - Legacy system compatibility
Constraints: Same as XdQuantity but optimized for 32-bit precision.
Examples: - Sensor readings requiring single precision - Legacy system numeric fields - Performance-critical calculations
XdDouble
Purpose: 64-bit floating point numbers with units.
Intended Uses: - High-precision scientific calculations - Financial data requiring double precision - Critical measurements
Constraints: Same as XdQuantity but optimized for 64-bit precision.
Examples: - Financial calculations (currency amounts) - High-precision scientific measurements - Critical medical measurements
List Data Types
SDC4 List types are organized into two categories based on their semantic meaning and inheritance structure:
Non-Quantified List Types (Inherit from XdAny)
XdStringList, XdTokenList, XdBooleanList
Purpose: Collections of string, token, or boolean values that don't represent measured quantities.
Intended Uses: - Multiple choice selections - Tag collections - List-based data - Categorical data collections
Constraints:
- min_items: Minimum number of items
- max_items: Maximum number of items
- exact_items: Fixed number of items
- allow_duplicates: Whether duplicate values are allowed
- ordered: Whether order is significant
Examples: - Multiple diagnoses (XdStringList) - Tags and keywords (XdTokenList) - Multiple flags (XdBooleanList)
Quantified List Types (Inherit from XdQuantified)
Quantified List types represent collections of measured values and inherit from XdQuantified, providing access to magnitude status, error margins, and accuracy fields.
XdIntegerList, XdNonNegativeIntegerList, XdPositiveIntegerList
Purpose: Collections of integer values with different constraints for measured quantities.
Intended Uses: - Multiple counts or identifiers - Sequence numbers - Index collections - Measured integer quantities
Constraints: Same as other list types, plus:
- units: Required - The units for all values in the list (e.g., "counts", "items", "people")
- require_ms / allow_ms: Control magnitude status field generation
- require_error / allow_error: Control error margin field generation
- require_accuracy / allow_accuracy: Control precision digits field generation
- XdNonNegativeIntegerList: Only non-negative integers (≥ 0)
- XdPositiveIntegerList: Only positive integers (> 0)
Examples: - Multiple ages in years (XdIntegerList, units: "years") - Count collections (XdNonNegativeIntegerList, units: "items") - ID collections (XdPositiveIntegerList, units: "identifiers")
XdDecimalList, XdDoubleList
Purpose: Collections of decimal/float values representing measured quantities.
Intended Uses: - Multiple measurements - Calculation results - Numeric collections with units - Scientific data collections
Constraints: Same as other list types, plus:
- units: Required - The units for all values in the list (e.g., "degrees Celsius", "kg", "mmHg")
- require_ms / allow_ms: Control magnitude status field generation
- require_error / allow_error: Control error margin field generation
- require_accuracy / allow_accuracy: Control precision digits field generation
Examples: - Multiple temperature readings (XdDecimalList, units: "degrees Celsius") - Multiple weights (XdDoubleList, units: "kg") - Blood pressure readings (XdDecimalList, units: "mmHg") - Laboratory values (XdDoubleList, units: "mg/dL")
Structural Components
Cluster
Purpose: The primary grouping structure for organizing data model components.
Intended Uses: - Creating hierarchical data structures - Grouping related fields - Building complex data models - Organizing form sections
Constraints: - Can contain any combination of other components - Supports nested clusters (recursive structure) - Cannot contain itself (circular reference prevention)
Examples: - Vital signs cluster (blood pressure, heart rate, temperature) - Lab results cluster (blood tests, urinalysis) - Clinical assessment cluster (cognitive tests, physical exam) - Nested clusters for complex forms
Important: Identity information (patient name, ID, demographics) should NOT be in data Clusters. Use the subject Party component of the DM for identity information to enable privacy protection and de-identification.
Party
Purpose: Represents actors, roles, and entities in the data model. Critical for identity separation.
Intended Uses:
- Subject identification (who data is ABOUT) - use in DM's subject field
- Provider identification (primary service provider) - use in DM's provider field
- Healthcare organization details
- Any entity participating in the data model
Constraints:
- demographics: Optional demographic information (name, birth date, gender, etc.)
- identifiers: Multiple identification methods (SSN, MRN, etc.)
- addresses: Contact information
- communications: Communication preferences
Examples:
- Patient Party (for DM subject) - Contains name, ID, demographics
- Healthcare Provider Party (for DM provider) - Hospital, clinic, physician
- Participation Party (for DM participations) - Nurse, technician, consultant
Critical Privacy Principle: Identity information MUST be in Party components (DM's subject, provider, participations fields), NOT in the data Cluster. This architectural separation enables:
- HIPAA/GDPR compliance through de-identification
- Privacy-preserving data sharing (redact subject, keep clinical data)
- Clear separation of "who" from "what"
Participation
Purpose: Models participation of parties in activities.
Intended Uses: - Healthcare provider roles - Supporting staff participation - Device/software participation - Any entity involvement in processes
Constraints:
- performer: The participating party
- function: Role or function of the party
- mode: Method of participation (in-person, phone, etc.)
Examples: - Primary care physician participation - Nurse assistance participation - Device monitoring participation
Audit
Purpose: Provides audit trail tracking for data changes.
Intended Uses: - Compliance and regulatory requirements - Data integrity tracking - Change history - Security monitoring
Constraints:
- audit_type: Type of audit event
- timestamp: When the event occurred
- performer: Who performed the action
- details: Additional audit information
Examples: - Data modification audits - Access control audits - System event audits
Attestation
Purpose: Allows verification that data is correct and complete.
Intended Uses: - Data quality assurance - Compliance verification - Quality control processes - Certification requirements
Constraints:
- attestation_type: Type of attestation
- attester: Who provided the attestation
- timestamp: When attestation was given
- details: Attestation details
Examples: - Data quality attestations - Compliance attestations - Quality control attestations
Supporting Components
Units
Purpose: Defines units of measurement for quantified data types.
Intended Uses: - Standardizing measurement units - Supporting international standards - Providing unit definitions and URIs
Constraints:
- enums: Unit symbols/abbreviations (one per line)
- definitions: URIs defining each unit
- def_val: Default unit
- str_fmt: Format validation for unit strings
Examples: - Weight units (kg, lb, g) - Temperature units (Celsius, Fahrenheit) - Time units (seconds, minutes, hours)
XdInterval
Purpose: Defines ranges and intervals for ordered values.
Intended Uses: - Value ranges for validation - Time periods - Numeric intervals - Any bounded range of values
Constraints:
- lower: Lower bound value
- upper: Upper bound value
- lower_included: Whether lower bound is inclusive
- upper_included: Whether upper bound is inclusive
- lower_bounded: Whether lower bound exists
- upper_bounded: Whether upper bound exists
Examples: - Age ranges (0-18, 19-65, 65+) - Temperature ranges (36.5-37.5°C) - Time periods (2020-2023)
ReferenceRange
Purpose: Defines named ranges with semantic meaning.
Intended Uses: - Normal value ranges - Critical value ranges - Therapeutic ranges - Any semantically meaningful value range
Constraints:
- definition: Semantic meaning (normal, critical, therapeutic)
- interval: Associated XdInterval
- is_normal: Whether this is considered normal
Examples: - Normal blood pressure ranges - Critical lab value ranges - Therapeutic drug level ranges
SimpleReferenceRange
Purpose: Simplified reference range with built-in interval definition.
Intended Uses: - Quick definition of value ranges - Simple normal ranges - Basic validation ranges
Constraints:
- definition: Semantic meaning
- lower/upper: Range bounds
- interval_type: Data type (int, decimal, float, dateTime, etc.)
- units_name/units_uri: Unit information
- is_normal: Whether this is normal range
Examples: - Normal temperature range (36.5-37.5°C) - Age ranges for different life stages - Weight ranges for different categories
Metadata and Organization
Project
Purpose: Organizes data models and components by project.
Intended Uses: - Team collaboration - Access control - Resource organization - Workflow management
Constraints:
- name: Project name (unique per owner)
- description: Project description
- owner: Project owner
- team: Associated team
- is_public: Public visibility
- is_default_library: Default library status
Modeler
Purpose: Represents users who create and manage data models.
Intended Uses: - User identification - Author attribution - Contributor tracking - Default project assignment
Constraints:
- user: Associated Django user
- name: Display name for metadata
- email: Contact email
- timezone: User's timezone
- project: Default project
- prj_filter: Project filtering preference
NS (Namespace)
Purpose: Manages namespaces and abbreviations for semantic links.
Intended Uses: - RDF namespace management - Ontology integration - Semantic linking - Vocabulary organization
Constraints:
- abbrev: Namespace abbreviation (unique)
- uri: Full namespace URI
- description: Namespace description
Predicate
Purpose: Defines predicates for RDF triples.
Intended Uses: - Semantic relationship definitions - Ontology predicate management - RDF triple creation
Constraints:
- ns_abbrev: Associated namespace
- class_name: Predicate name
- Unique combination of namespace and class name
PredObj (Predicate Object)
Purpose: Represents RDF triple objects for semantic linking.
Intended Uses: - Semantic annotation - Ontology integration - Knowledge representation - Semantic data modeling
Constraints:
- po_name: Human-readable name
- predicate: Associated predicate
- object_uri: Object URI
- project: Associated project
DM (Data Model)
Purpose: The root node of a complete data model with architectural separation of identity and clinical data.
Intended Uses: - Complete data model definition - Schema generation with privacy-by-design - Documentation creation - Application generation
DM Structure - Architectural Sections:
data(Cluster) - Clinical/domain data ONLY- Contains observations, measurements, assessments, results
- Should NOT contain identity information
-
Can be shared without identity when
subjectis redacted -
subject(Party) - Who data is ABOUT - Patient, research participant, person of interest
- Contains identifying information (name, ID, demographics)
-
Can be redacted for de-identification
-
provider(Party) - Primary provider of services/care - Healthcare organization, government agency, research institution
-
Primary responsible entity
-
participations(ManyToMany Participation) - Other parties involved - Nurses, consultants, technicians, assistants
-
Any additional entities in data creation process
-
audit(ManyToMany Audit) - Audit trail attestation(Attestation) - Data verificationprotocol(XdString) - External protocol IDworkflow(XdLink) - Workflow engine identifieracs(XdLink) - Access control systemlinks(ManyToMany XdLink) - Ad-hoc links
Privacy-by-Design Architecture:
- ✅ Clinical data in data Cluster
- ✅ Identity in subject Party
- ✅ Separation enables de-identification
- ❌ Never mix identity into data Cluster
Generated Outputs: - XSD schema files - XML instance documents - JSON representations - JSON-LD schema (semantic descriptions) - HTML documentation - RDF triples for semantic web - SHACL constraint files - GQL CREATE statements
Common Features Across All Models
Base Constraints (Common class)
All models inherit these common features:
- project: Associated project
- public: Public visibility
- label: Human-readable label
- ct_id: CUID2 unique identifier
- created/updated: Timestamps
- published: Publication status
- description: Detailed description
- pred_obj: Semantic links
- schema_code: Generated schema code
- lang: Language specification
- creator/edited_by: Attribution
- seq: Sequence number for UI ordering
- validate: Hard validation flag
- app_code: Generated application code
XdAny Features
All data types inherit these features:
- adapter_ctid: Adapter identifier
- require_act/allow_act: Access control requirements
- require_vtb/allow_vtb: Valid time begin
- require_vte/allow_vte: Valid time end
- require_tr/allow_tr: Time recorded
- require_mod/allow_mod: Time modified
- require_location/allow_location: Location requirements
- ui_type: Preferred UI type (input, dropdown, radio, etc.)
Publication Workflow
All models follow a two-stage publication process: 1. Draft: Models can be created and modified 2. Published: Models are locked and can be used in data models
Published models cannot be deleted and are available for use in data model generation.
Best Practices
Model Selection
- Use XdString for: Text data, codes, identifiers, enumerated values
- Use XdBoolean for: True/false decisions only (not multi-value enums)
- Use XdCount for: Integer values with units (counts, quantities)
- Use XdQuantity for: Decimal values with units (measurements)
- Use XdTemporal for: Date/time values with flexible precision
- Use XdOrdinal for: Rankings, scores, and ordered categories
- Use Clusters for: Grouping related fields and creating hierarchy
Constraint Guidelines
- Start with minimal constraints and add more as needed
- Use enumerations for controlled vocabularies
- Set appropriate ranges for numeric values
- Define units for all quantified values
- Use reference ranges for normal/abnormal value definitions
- Add semantic links for ontology integration
Naming Conventions
- Use descriptive labels that will appear in generated UIs
- Follow consistent naming within projects
- Use clear, concise descriptions
- Include usage examples in descriptions
Publication Strategy
- Create and test models in draft mode
- Review and validate before publication
- Use published models in data models
- Version control through project organization
This comprehensive model system provides the foundation for creating robust, semantic data models that can be used across various domains and applications.