SDCStudio System Overview
What is SDCStudio?
SDCStudio is an AI-powered data modeling platform designed to help organizations create, manage, and deploy SDC4-compliant data models and applications. It combines intelligent data processing with powerful generation capabilities to streamline the data modeling workflow.
Core System Architecture
Three Main Components
1. Uploader App - Data Ingestion & Processing
- Purpose: Handles file uploads and AI-powered data analysis
- Capabilities:
- CSV file support with column analysis
- Markdown template support (Form2SDCTemplate, SDCObsidianTemplate)
- Intelligent column detection and type inference
- Semantic context analysis using AI with ontology integration
- Two-stage processing pipeline for reliability
- Key Features:
- Real-time processing status updates via React SPA interface
- Automatic retry mechanisms for failed operations
- Intelligent discovery of data patterns and relationships
- Custom ontology upload in Turtle (.ttl) format for domain-specific enhancement
2. DMGEN App - Data Model Management
- Purpose: Core data modeling and component management
- Capabilities:
- Component-based data model creation
- SDC4-compliant data type support
- Validation rule management
- Relationship and constraint definition
- Key Features:
- Visual component editor
- Real-time validation feedback
- Component reuse and sharing
- Version control and change tracking
3. Generator App - Output Generation & Applications
- Purpose: Creates various outputs from data models
- Capabilities:
- XSD Schema generation
- XML Instance generation (schema-aware)
- JSON Schema creation
- JSON-LD (Linked Data) schema generation
- HTML documentation generation
- RDF triple extraction for semantic web
- SHACL (Shapes Constraint Language) validation files
- GQL (Graph Query Language) CREATE statements
- Django application generation via AppGen
- Key Features:
- Schema-aware XML generation with unique instance IDs
- Template-based output customization
- Multi-format export options (8+ output formats)
- Full Django application scaffolding with Docker deployment
AI-Powered Processing Pipeline
Two-Stage Processing Architecture
Phase 1: Structural Analysis (Fast)
- Duration: 30 seconds - 2 minutes
- Purpose: Quick structural understanding of uploaded data
- Process:
- File format detection and parsing
- Basic structure identification
- Column/field extraction
- Initial type inference
- Output: Basic data structure map
Phase 2: AI Enhancement (Comprehensive)
- Duration: 1-5 minutes depending on complexity
- Purpose: Deep semantic understanding and optimization
- Process:
- Semantic context analysis
- Pattern recognition across data
- Relationship detection
- Component optimization suggestions
- Validation rule recommendations
- Output: Enhanced data model with semantic context
AI Capabilities
Intelligent Discovery
- UI Type Detection: Automatically suggests appropriate UI components
- Pattern Recognition: Identifies repeating structures and relationships
- Reference Range Detection: Suggests validation ranges based on data
- Cross-Cluster Analysis: Finds patterns across multiple data clusters
Semantic Enhancement
- Context Understanding: Analyzes data meaning and purpose
- Industry Alignment: Suggests domain-specific components
- Best Practice Recommendations: Applies SDC4 modeling standards
- Relationship Mapping: Identifies connections between data elements
Data Model Components
Core Data Types
Primitive Types
- XdString: Text data with validation rules
- XdCount: Integer values with constraints
- XdQuantity: Numeric values with units
- XdBoolean: True/false values
- XdDateTime: Date and time values
- XdFloat/XdDouble: Floating-point numbers
Complex Types
- Clusters: Grouped related components
- Lists: Ordered collections of components
- References: Links to other components
- Adapters: Wrapper components for special handling
Component Features
Validation & Constraints
- Pattern Matching: Regular expression validation
- Range Validation: Min/max value constraints
- Enumeration: Predefined value lists
- Required/Optional: Field requirement rules
Metadata & Relationships
- Labels: Human-readable descriptions
- Documentation: Detailed component information
- Tags: Categorization and search
- Dependencies: Component relationships
Output Generation System
Schema Generation
XSD Schema
- Purpose: XML Schema Definition for data validation
- Features: Full SDC4 compliance with custom extensions
- Output: Standard XSD file with proper namespaces
XML Instance
- Purpose: Example XML documents conforming to schemas
- Features: Schema-aware generation with realistic data
- Output: Valid XML files ready for testing and development
JSON Schema
- Purpose: JSON validation schemas
- Features: JSON Schema Draft 2020-12 compliance
- Output: JSON schema files for API development
JSON-LD (Linked Data)
- Purpose: Semantic descriptions for linked data integration
- Features: RDF-compatible JSON with context mappings
- Output: JSON-LD files for knowledge graphs and semantic web
RDF Triples
- Purpose: Semantic web integration and triple store loading
- Features: Subject-predicate-object statements extracted from schemas
- Output: RDF triples for SPARQL queries and semantic databases
SHACL (Shapes Constraint Language)
- Purpose: RDF data validation and quality checking
- Features: Complete validation constraints as shape definitions
- Output: SHACL files for semantic data validation
GQL (Graph Query Language)
- Purpose: Property graph database integration
- Features: CREATE statements for nodes and relationships
- Output: GQL scripts for Neo4j and other graph databases
Application Generation
AppGen generates complete Django web applications from your data models. Each data model creates a separate Django app that can run standalone or be combined with other generated apps into a single Django project. See the generated README for multi-app integration instructions.
Choose between two deployment options:
Lightweight Template (Open Source)
- Purpose: Simple, production-ready Django application for open-source projects
- Best For: Small to medium projects, prototyping, learning, community projects
- Infrastructure Stack:
- PostgreSQL 16: Relational database
- Apache Jena Fuseki: RDF triplestore for semantic data
- Redis: Caching and message broker
- Django Web App: CRUD interface with forms and validation
- Celery: Background task processing
- Features:
- Full CRUD operations (Create, Read, Update, Delete)
- Django models matching data model components
- Web forms with built-in validation
- Admin interface for data management
- XML instance generation from entered data
- API endpoints for data access
- Docker Compose deployment (5 containers)
- Resource Requirements: Moderate (suitable for single server deployment)
Enterprise Template (Production-Ready)
- Purpose: Advanced enterprise application with semantic reasoning and security
- Best For: Large organizations, regulated industries, complex semantic requirements
- Infrastructure Stack:
- PostgreSQL 16: Relational database
- GraphDB Free Edition: OWL 2 reasoning triplestore with RDFS, OWL 2 RL, and SHACL validation
- SirixDB: Temporal database with time-travel queries and full versioning
- Keycloak: Enterprise SSO and role-based access control (RBAC)
- Redis: Caching and message broker
- Django Web App: CRUD interface with forms and validation
- Celery: Background task processing
- Advanced Features:
- All Lightweight features PLUS:
- OWL 2 Reasoning: Automatic inference and semantic validation
- Temporal Queries: Time-travel queries and complete audit trail via SirixDB
- Enterprise Authentication: Keycloak SSO with RBAC and OAuth2/OIDC
- SHACL Validation: RDF constraint validation
- Enhanced Semantics: Knowledge graph capabilities with reasoning
- Resource Requirements: Higher (recommended for cluster deployment)
- Docker Compose: 7+ containers with health checks and dependencies
Common Features (Both Templates)
- Complete Django project structure (one per data model)
- SDC4-compliant data models
- Responsive Bootstrap UI
- Docker deployment ready
- Automatic migrations and initialization
- XML instance generation
- Comprehensive documentation
- MIT License (open source)
- Multi-App Support: Combine multiple generated apps into a single Django project
- Each data model generates a separate Django app
- Apps can be integrated into one running project
- Detailed instructions included in generated README
How to Choose
- Use Lightweight if you need:
- Quick prototyping and development
- Simple deployment on a single server
- Basic RDF triplestore capabilities
- Lower infrastructure costs
-
Open-source community projects
-
Use Enterprise if you need:
- OWL 2 reasoning and semantic inference
- Temporal versioning and audit trails
- Enterprise SSO and RBAC
- SHACL validation for data quality
- Advanced knowledge graph capabilities
- Compliance and regulatory requirements
HTML Documentation
- Purpose: Human-readable documentation of data models
- Features: Comprehensive component documentation with descriptions
- Output: Styled HTML pages for team reference and stakeholder review
Integration & Extensibility
External System Integration
Ontology Management
- Purpose: Semantic enhancement and standardization
- Capabilities: Upload and manage ontology files
- Benefits: Improved AI understanding and suggestions
API Access
- Purpose: Programmatic access to system capabilities
- Features: RESTful API with authentication
- Use Cases: Automated workflows, third-party integration
Export Options
- Formats: Multiple output formats for different use cases
- Standards: Industry-standard compliance (SDC4, JSON Schema)
- Customization: Template-based output customization
Development & Deployment
Development Environment
- Setup: Docker-based development environment
- Tools: Django backend, modern frontend framework
- Testing: Comprehensive test suite and validation
Production Deployment
- Options: Cloud deployment, on-premises installation
- Scaling: Horizontal scaling capabilities
- Monitoring: Built-in logging and performance monitoring
System Benefits
For Data Modelers
- Faster Modeling: AI-assisted component creation and optimization
- Better Quality: Built-in validation and best practices
- Standards Compliance: Automatic SDC4 compliance checking
- Collaboration: Team-based modeling and sharing
For Developers
- Rapid Prototyping: Quick generation of working applications
- Consistent Output: Standardized schemas and templates
- Integration Ready: Pre-built API endpoints and data models
- Maintainable Code: Generated code follows best practices
For Organizations
- Reduced Time-to-Market: Faster data model development
- Improved Quality: AI-optimized models and validation
- Standards Alignment: Consistent SDC4 compliance
- Cost Reduction: Automated generation reduces development time
Technology Stack
Backend
- Framework: Django (Python)
- Database: PostgreSQL with vector extensions
- AI Processing: LLM integration with Google Vertex AI (Gemini 2.0 Flash)
- Task Queue: Celery with Redis
Frontend
- Framework: React 19 with TypeScript
- Build Tool: Vite for fast development and optimized production builds
- Styling: Tailwind CSS for responsive design
- Components: Reusable component library
- Interactions: Real-time updates via WebSocket and REST API
- Deployment: Static files served via WhiteNoise in production
Infrastructure
- Containerization: Docker and Docker Compose
- Search: Vector-based semantic search
- Caching: Redis for performance optimization
- Monitoring: Built-in logging and error tracking
User Interface
React SPA (Primary Interface)
- Purpose: Modern, responsive single-page application
- Features:
- Real-time status updates without page refresh
- Interactive data model editing
- WebSocket-based Atlas chatbot for assistance
- Responsive design for all devices
- Intuitive navigation and breadcrumbs
- Access: Primary interface at
/app/*routes
Settings Management
- Profile Configuration: User preferences and organization details
- Ontology Upload: Custom domain ontologies in Turtle (.ttl) format
- Preferences: Interface customization and notification settings
- Importance: Configure Settings and upload ontologies before creating models for best AI results
Future Roadmap
Planned Enhancements
- Enhanced Template Support: Expanded Markdown template capabilities
- Advanced AI Models: Improved semantic understanding and suggestions
- Real-time Collaboration: Multi-user editing and commenting
- Advanced Analytics: Usage tracking and model optimization
- Workflow Automation: Custom processing pipelines
Integration Expansion
- Enterprise Systems: ERP and CRM integration
- Cloud Services: Enhanced GCP integration, AWS and Azure support
- Triplestore Options: GraphDB and other RDF database integrations
- Knowledge Graph: Enhanced semantic search and relationship discovery
Ready to get started? Check out the Quick Start Guide to begin using SDCStudio, or explore the User Guides for detailed information about specific features.