SDCStudio DMGEN Model Definitions

This document provides a comprehensive overview of all the data model components (models) available in the SDCStudio DMGEN system, their constraints, and intended uses.

Table of Contents

Overview

The SDCStudio DMGEN system implements the SDC4 (Semantic Data Charter) Reference Model, providing a comprehensive framework for creating semantic data models. All models use CUID2 (Collision-resistant Unique IDentifier) as their primary key and follow a consistent pattern of inheritance from base classes.

Model Hierarchy

Common (Abstract Base)
├── XdAny (Abstract Base for all data types)
│   ├── XdString (Text data)
│   ├── XdToken (Normalized text)
│   ├── XdBoolean (True/false values)
│   ├── XdLink (URI references)
│   ├── XdFile (File attachments)
│   ├── XdOrdered (Abstract for ordered values)
│   │   ├── XdOrdinal (Rankings and scores)
│   │   ├── XdTemporal (Date/time values)
│   │   └── XdQuantified (Abstract for numeric values)
│   │       ├── XdCount (Integer values)
│   │       ├── XdQuantity (Decimal with units)
│   │       ├── XdFloat (32-bit floating point)
│   │       └── XdDouble (64-bit floating point)
│   └── List Types (Collections of values)
├── Structural Components
│   ├── Cluster (Grouping structure)
│   ├── Party (Actor/Role representation)
│   ├── Participation (Activity participation)
│   ├── Audit (Audit trail)
│   └── Attestation (Data verification)
└── Supporting Components
    ├── Units (Unit definitions)
    ├── ReferenceRange (Value ranges)
    ├── SimpleReferenceRange (Simple ranges)
    └── XdInterval (Interval definitions)

Core Data Types

XdString

Purpose: Text data with various constraints and validation rules.

Intended Uses: - Names, descriptions, and free text - Codes and identifiers - Email addresses, URLs, and formatted text - Enumerated values with descriptions

Constraints: - min_length: Minimum character count - max_length: Maximum character count - exact_length: Fixed character count (for codes/identifiers) - enums: List of allowed values (one per line) - enum_descr: Descriptions for enumerated values - definitions: URIs defining each enumeration - def_val: Default value (up to 255 characters) - str_fmt: Regular expression pattern for format validation

Constraint Priority: Enumeration > Exact Length > Min/Max Lengths > Default value

Examples: - Observation notes (min_length: 10, max_length: 500) - Diagnosis codes (enums: ICD-10 codes, exact_length: 7) - Medication names (min_length: 1, max_length: 100) - Email addresses (str_fmt: email regex pattern)

Note: Person names and identity should use Party components in the DM's subject section, not XdString in the data Cluster.

XdToken

Purpose: Normalized text where whitespace is standardized.

Intended Uses: - Codes and identifiers that may have inconsistent spacing - Normalized text fields - Searchable text content

Constraints: Same as XdString, plus: - language: Language specification for the token

Examples: - Product codes with inconsistent spacing - Normalized names and titles - Searchable keywords

XdBoolean

Purpose: True/false or yes/no decisions.

Intended Uses: - Binary flags and indicators - Presence/absence indicators - Simple decision points

Constraints: - trues: List of values representing TRUE (one per line) - falses: List of values representing FALSE (one per line)

Important: Should not be used for enumerated types with more than two values (e.g., male/female should use XdString with enumerations).

Examples: - "Is patient pregnant?" (trues: Yes, true, 1; falses: No, false, 0) - "Has insurance?" (trues: Yes; falses: No) - "Is active?" (trues: true, 1; falses: false, 0)

Purpose: References to external resources and URIs.

Intended Uses: - Links to other data models - External resource references - Semantic relationships

Constraints: - link: The URI that points to the linked item - relation: Description of the relationship - relation_uri: URI defining the relationship type

Examples: - Links to related clinical guidelines - References to external ontologies - Links to supporting documentation

XdFile

Purpose: File attachments with MIME type specifications.

Intended Uses: - Image attachments (photos, scans) - Document attachments (PDFs, reports) - Audio/video files - Any binary content with defined MIME types

Constraints: - mime_type: MIME type specification - file_extension: Allowed file extensions - max_size: Maximum file size in bytes

Examples: - Patient photos (mime_type: image/jpeg) - Medical reports (mime_type: application/pdf) - Audio recordings (mime_type: audio/wav)

Quantified Data Types

XdCount

Purpose: Integer values with constraints and units.

Intended Uses: - Countable quantities (pregnancies, steps, cigarettes) - Identifiers and sequence numbers - Discrete measurements

Constraints: - min_magnitude: Minimum allowed value - max_magnitude: Maximum allowed value - total_digits: Maximum number of digits - units: Required units specification (ForeignKey to Units) - min_inclusive/max_inclusive: Inclusive range bounds - min_exclusive/max_exclusive: Exclusive range bounds

Important: Not for physical quantities with standardized units (use XdQuantity instead).

Examples: - Age (min_magnitude: 0, max_magnitude: 150, units: years) - Number of children (min_magnitude: 0, max_magnitude: 20, units: count) - Pregnancy count (min_magnitude: 0, max_magnitude: 20, units: pregnancies)

XdQuantity

Purpose: Decimal values with units and precision control.

Intended Uses: - Physical measurements (weight, height, temperature) - Scientific quantities - Time durations - Any decimal value requiring units

Constraints: - min_magnitude/max_magnitude: Value range - total_digits: Maximum total digits - fraction_digits: Maximum decimal places - units: Required units specification (ForeignKey to Units) - min_inclusive/max_inclusive: Inclusive range bounds - min_exclusive/max_exclusive: Exclusive range bounds

Examples: - Weight (units: kg, min_magnitude: 0, max_magnitude: 500, fraction_digits: 2) - Temperature (units: Celsius, min_magnitude: -50, max_magnitude: 60, fraction_digits: 1) - Blood pressure (units: mmHg, min_magnitude: 0, max_magnitude: 300, fraction_digits: 0)

XdFloat

Purpose: 32-bit floating point numbers with units.

Intended Uses: - Scientific calculations requiring single precision - Performance-critical numeric data - Legacy system compatibility

Constraints: Same as XdQuantity but optimized for 32-bit precision.

Examples: - Sensor readings requiring single precision - Legacy system numeric fields - Performance-critical calculations

XdDouble

Purpose: 64-bit floating point numbers with units.

Intended Uses: - High-precision scientific calculations - Financial data requiring double precision - Critical measurements

Constraints: Same as XdQuantity but optimized for 64-bit precision.

Examples: - Financial calculations (currency amounts) - High-precision scientific measurements - Critical medical measurements

List Data Types

SDC4 List types are organized into two categories based on their semantic meaning and inheritance structure:

Non-Quantified List Types (Inherit from XdAny)

XdStringList, XdTokenList, XdBooleanList

Purpose: Collections of string, token, or boolean values that don't represent measured quantities.

Intended Uses: - Multiple choice selections - Tag collections - List-based data - Categorical data collections

Constraints: - min_items: Minimum number of items - max_items: Maximum number of items - exact_items: Fixed number of items - allow_duplicates: Whether duplicate values are allowed - ordered: Whether order is significant

Examples: - Multiple diagnoses (XdStringList) - Tags and keywords (XdTokenList) - Multiple flags (XdBooleanList)

Quantified List Types (Inherit from XdQuantified)

Quantified List types represent collections of measured values and inherit from XdQuantified, providing access to magnitude status, error margins, and accuracy fields.

XdIntegerList, XdNonNegativeIntegerList, XdPositiveIntegerList

Purpose: Collections of integer values with different constraints for measured quantities.

Intended Uses: - Multiple counts or identifiers - Sequence numbers - Index collections - Measured integer quantities

Constraints: Same as other list types, plus: - units: Required - The units for all values in the list (e.g., "counts", "items", "people") - require_ms / allow_ms: Control magnitude status field generation - require_error / allow_error: Control error margin field generation - require_accuracy / allow_accuracy: Control precision digits field generation - XdNonNegativeIntegerList: Only non-negative integers (≥ 0) - XdPositiveIntegerList: Only positive integers (> 0)

Examples: - Multiple ages in years (XdIntegerList, units: "years") - Count collections (XdNonNegativeIntegerList, units: "items") - ID collections (XdPositiveIntegerList, units: "identifiers")

XdDecimalList, XdDoubleList

Purpose: Collections of decimal/float values representing measured quantities.

Intended Uses: - Multiple measurements - Calculation results - Numeric collections with units - Scientific data collections

Constraints: Same as other list types, plus: - units: Required - The units for all values in the list (e.g., "degrees Celsius", "kg", "mmHg") - require_ms / allow_ms: Control magnitude status field generation - require_error / allow_error: Control error margin field generation - require_accuracy / allow_accuracy: Control precision digits field generation

Examples: - Multiple temperature readings (XdDecimalList, units: "degrees Celsius") - Multiple weights (XdDoubleList, units: "kg") - Blood pressure readings (XdDecimalList, units: "mmHg") - Laboratory values (XdDoubleList, units: "mg/dL")

Structural Components

Cluster

Purpose: The primary grouping structure for organizing data model components.

Intended Uses: - Creating hierarchical data structures - Grouping related fields - Building complex data models - Organizing form sections

Constraints: - Can contain any combination of other components - Supports nested clusters (recursive structure) - Cannot contain itself (circular reference prevention)

Examples: - Vital signs cluster (blood pressure, heart rate, temperature) - Lab results cluster (blood tests, urinalysis) - Clinical assessment cluster (cognitive tests, physical exam) - Nested clusters for complex forms

Important: Identity information (patient name, ID, demographics) should NOT be in data Clusters. Use the subject Party component of the DM for identity information to enable privacy protection and de-identification.

Party

Purpose: Represents actors, roles, and entities in the data model. Critical for identity separation.

Intended Uses: - Subject identification (who data is ABOUT) - use in DM's subject field - Provider identification (primary service provider) - use in DM's provider field - Healthcare organization details - Any entity participating in the data model

Constraints: - demographics: Optional demographic information (name, birth date, gender, etc.) - identifiers: Multiple identification methods (SSN, MRN, etc.) - addresses: Contact information - communications: Communication preferences

Examples: - Patient Party (for DM subject) - Contains name, ID, demographics - Healthcare Provider Party (for DM provider) - Hospital, clinic, physician - Participation Party (for DM participations) - Nurse, technician, consultant

Critical Privacy Principle: Identity information MUST be in Party components (DM's subject, provider, participations fields), NOT in the data Cluster. This architectural separation enables: - HIPAA/GDPR compliance through de-identification - Privacy-preserving data sharing (redact subject, keep clinical data) - Clear separation of "who" from "what"

Participation

Purpose: Models participation of parties in activities.

Intended Uses: - Healthcare provider roles - Supporting staff participation - Device/software participation - Any entity involvement in processes

Constraints: - performer: The participating party - function: Role or function of the party - mode: Method of participation (in-person, phone, etc.)

Examples: - Primary care physician participation - Nurse assistance participation - Device monitoring participation

Audit

Purpose: Provides audit trail tracking for data changes.

Intended Uses: - Compliance and regulatory requirements - Data integrity tracking - Change history - Security monitoring

Constraints: - audit_type: Type of audit event - timestamp: When the event occurred - performer: Who performed the action - details: Additional audit information

Examples: - Data modification audits - Access control audits - System event audits

Attestation

Purpose: Allows verification that data is correct and complete.

Intended Uses: - Data quality assurance - Compliance verification - Quality control processes - Certification requirements

Constraints: - attestation_type: Type of attestation - attester: Who provided the attestation - timestamp: When attestation was given - details: Attestation details

Examples: - Data quality attestations - Compliance attestations - Quality control attestations

Supporting Components

Units

Purpose: Defines units of measurement for quantified data types.

Intended Uses: - Standardizing measurement units - Supporting international standards - Providing unit definitions and URIs

Constraints: - enums: Unit symbols/abbreviations (one per line) - definitions: URIs defining each unit - def_val: Default unit - str_fmt: Format validation for unit strings

Examples: - Weight units (kg, lb, g) - Temperature units (Celsius, Fahrenheit) - Time units (seconds, minutes, hours)

XdInterval

Purpose: Defines ranges and intervals for ordered values.

Intended Uses: - Value ranges for validation - Time periods - Numeric intervals - Any bounded range of values

Constraints: - lower: Lower bound value - upper: Upper bound value - lower_included: Whether lower bound is inclusive - upper_included: Whether upper bound is inclusive - lower_bounded: Whether lower bound exists - upper_bounded: Whether upper bound exists

Examples: - Age ranges (0-18, 19-65, 65+) - Temperature ranges (36.5-37.5°C) - Time periods (2020-2023)

ReferenceRange

Purpose: Defines named ranges with semantic meaning.

Intended Uses: - Normal value ranges - Critical value ranges - Therapeutic ranges - Any semantically meaningful value range

Constraints: - definition: Semantic meaning (normal, critical, therapeutic) - interval: Associated XdInterval - is_normal: Whether this is considered normal

Examples: - Normal blood pressure ranges - Critical lab value ranges - Therapeutic drug level ranges

SimpleReferenceRange

Purpose: Simplified reference range with built-in interval definition.

Intended Uses: - Quick definition of value ranges - Simple normal ranges - Basic validation ranges

Constraints: - definition: Semantic meaning - lower/upper: Range bounds - interval_type: Data type (int, decimal, float, dateTime, etc.) - units_name/units_uri: Unit information - is_normal: Whether this is normal range

Examples: - Normal temperature range (36.5-37.5°C) - Age ranges for different life stages - Weight ranges for different categories

Metadata and Organization

Project

Purpose: Organizes data models and components by project.

Intended Uses: - Team collaboration - Access control - Resource organization - Workflow management

Constraints: - name: Project name (unique per owner) - description: Project description - owner: Project owner - team: Associated team - is_public: Public visibility - is_default_library: Default library status

Modeler

Purpose: Represents users who create and manage data models.

Intended Uses: - User identification - Author attribution - Contributor tracking - Default project assignment

Constraints: - user: Associated Django user - name: Display name for metadata - email: Contact email - timezone: User's timezone - project: Default project - prj_filter: Project filtering preference

NS (Namespace)

Purpose: Manages namespaces and abbreviations for semantic links.

Intended Uses: - RDF namespace management - Ontology integration - Semantic linking - Vocabulary organization

Constraints: - abbrev: Namespace abbreviation (unique) - uri: Full namespace URI - description: Namespace description

Predicate

Purpose: Defines predicates for RDF triples.

Intended Uses: - Semantic relationship definitions - Ontology predicate management - RDF triple creation

Constraints: - ns_abbrev: Associated namespace - class_name: Predicate name - Unique combination of namespace and class name

PredObj (Predicate Object)

Purpose: Represents RDF triple objects for semantic linking.

Intended Uses: - Semantic annotation - Ontology integration - Knowledge representation - Semantic data modeling

Constraints: - po_name: Human-readable name - predicate: Associated predicate - object_uri: Object URI - project: Associated project

DM (Data Model)

Purpose: The root node of a complete data model with architectural separation of identity and clinical data.

Intended Uses: - Complete data model definition - Schema generation with privacy-by-design - Documentation creation - Application generation

DM Structure - Architectural Sections:

  1. data (Cluster) - Clinical/domain data ONLY
  2. Contains observations, measurements, assessments, results
  3. Should NOT contain identity information
  4. Can be shared without identity when subject is redacted

  5. subject (Party) - Who data is ABOUT

  6. Patient, research participant, person of interest
  7. Contains identifying information (name, ID, demographics)
  8. Can be redacted for de-identification

  9. provider (Party) - Primary provider of services/care

  10. Healthcare organization, government agency, research institution
  11. Primary responsible entity

  12. participations (ManyToMany Participation) - Other parties involved

  13. Nurses, consultants, technicians, assistants
  14. Any additional entities in data creation process

  15. audit (ManyToMany Audit) - Audit trail

  16. attestation (Attestation) - Data verification
  17. protocol (XdString) - External protocol ID
  18. workflow (XdLink) - Workflow engine identifier
  19. acs (XdLink) - Access control system
  20. links (ManyToMany XdLink) - Ad-hoc links

Privacy-by-Design Architecture: - ✅ Clinical data in data Cluster - ✅ Identity in subject Party - ✅ Separation enables de-identification - ❌ Never mix identity into data Cluster

Generated Outputs: - XSD schema files - XML instance documents - JSON representations - JSON-LD schema (semantic descriptions) - HTML documentation - RDF triples for semantic web - SHACL constraint files - GQL CREATE statements

Common Features Across All Models

Base Constraints (Common class)

All models inherit these common features: - project: Associated project - public: Public visibility - label: Human-readable label - ct_id: CUID2 unique identifier - created/updated: Timestamps - published: Publication status - description: Detailed description - pred_obj: Semantic links - schema_code: Generated schema code - lang: Language specification - creator/edited_by: Attribution - seq: Sequence number for UI ordering - validate: Hard validation flag - app_code: Generated application code

XdAny Features

All data types inherit these features: - adapter_ctid: Adapter identifier - require_act/allow_act: Access control requirements - require_vtb/allow_vtb: Valid time begin - require_vte/allow_vte: Valid time end - require_tr/allow_tr: Time recorded - require_mod/allow_mod: Time modified - require_location/allow_location: Location requirements - ui_type: Preferred UI type (input, dropdown, radio, etc.)

Publication Workflow

All models follow a two-stage publication process: 1. Draft: Models can be created and modified 2. Published: Models are locked and can be used in data models

Published models cannot be deleted and are available for use in data model generation.

Best Practices

Model Selection

  1. Use XdString for: Text data, codes, identifiers, enumerated values
  2. Use XdBoolean for: True/false decisions only (not multi-value enums)
  3. Use XdCount for: Integer values with units (counts, quantities)
  4. Use XdQuantity for: Decimal values with units (measurements)
  5. Use XdTemporal for: Date/time values with flexible precision
  6. Use XdOrdinal for: Rankings, scores, and ordered categories
  7. Use Clusters for: Grouping related fields and creating hierarchy

Constraint Guidelines

  1. Start with minimal constraints and add more as needed
  2. Use enumerations for controlled vocabularies
  3. Set appropriate ranges for numeric values
  4. Define units for all quantified values
  5. Use reference ranges for normal/abnormal value definitions
  6. Add semantic links for ontology integration

Naming Conventions

  1. Use descriptive labels that will appear in generated UIs
  2. Follow consistent naming within projects
  3. Use clear, concise descriptions
  4. Include usage examples in descriptions

Publication Strategy

  1. Create and test models in draft mode
  2. Review and validate before publication
  3. Use published models in data models
  4. Version control through project organization

This comprehensive model system provides the foundation for creating robust, semantic data models that can be used across various domains and applications.