Module 1 Exercise - Reference Answer Key
Part 1: Semantic Ambiguity
| Column/Value | Ambiguity |
|---|---|
region | Inconsistent naming: "Northeast" vs "NE" vs "northeast". "West" vs "West Coast" vs "Western". "Asia-Pacific" vs "APAC" vs "AP". A receiving system cannot resolve these without a mapping table. |
status | Mixed case: "active" vs "Active" vs "ACTIVE". Typo: "actve" (row 9). Non-standard values: "test", "demo", "system", "migrated", "VIP", "dev". No enumeration enforced. |
phone | Multiple formats: "555-0101" vs "(555) 0102" vs "5550103" vs "+86-10-5550108". Some rows use "N/A" or "000-0000" as placeholders instead of proper null handling. |
signup_date | Multiple date formats: "2024-01-15" vs "01/20/2024" vs "2024/04/15" vs "May 1 2024". A parser assuming ISO 8601 will fail on at least 8 rows. |
name | "Jane Smith" appears in rows 1, 3, 12, 29, 41 - same person? "Bob Johnson" vs "Robert Johnson" vs "Robert 'Bob' Johnson" - how many people? "ACME Corp" vs "acme corporation" - same company? |
customer_id | Auto-increment integers scoped to this export. Row 1's customer_id "1" means nothing in any other system. Level 1 Semantic Identity. |
Part 2: Constraint Enforcement Failures
| Row(s) | Violation |
|---|---|
| Row 7 | Missing name (empty field). |
| Row 9 | Status typo: "actve" instead of "active". No enumeration constraint. |
| Row 13 | Test account in production data. Name "Test Account", status "test". |
| Row 19 | Name is "NULL" (the string). Signup date "1900-01-01" is a placeholder. |
| Row 28 | Test data: status "demo", balance 999999.99, region "Demo". |
| Row 40 | System placeholder: name "Placeholder Entry", date "1970-01-01" (Unix epoch). |
| Row 42 | Dev data: name "TESTING", status "dev", negative balance. |
At least 11 rows would fail basic constraint enforcement. This is Level 1-2 Constraint Enforcement.
Part 3: Identity Instability
| Entity | Representations | Rows |
|---|---|---|
| Jane Smith | Different emails, same phone in some rows | 1, 3, 12, 29, 41 |
| Bob/Robert Johnson | "Bob Johnson", "Robert Johnson", "Bob Johnson Jr", "Robert 'Bob' Johnson" | 2, 4, 15, 30 |
| ACME Corp | "ACME Corp", "acme corporation", "Acme Corporation" with different emails | 5, 6, 22, 31 |
| Patricia/Pat Williams | "Patricia Williams" vs "pat williams" | 32, 33 |
| Michael/Mike Brown | "Michael Brown" vs "mike brown", same email | 36, 37 |
| Global Solutions | "Global Solutions LLC" vs "global solutions" | 16, 17 |
The customer_id column provides no help - each duplicate has a different ID. This is Level 1 Semantic Identity.
Part 4: Floor Constraint Application
| Dimension | Working Score | Evidence |
|---|---|---|
| Schema Integrity | Level 2 | A schema exists (columns defined) but not enforced at write time - test data, placeholders, and inconsistent formats accepted. |
| Constraint Enforcement | Level 1 | No enumeration on status, no format enforcement on phone or date, no null handling policy. |
| Semantic Identity | Level 1 | Auto-increment IDs scoped to one database. Duplicate entities have different IDs. |
Floor Constraint
min(2, 1, 1) = Level 1. All derived dimensions (Provenance, Interoperability, Governance) are functionally capped at Level 1. You cannot trace the provenance of data you cannot identify. You cannot share data whose meaning is ambiguous. You cannot govern what you cannot constrain.
Recommended first intervention: Semantic Identity (Level 1 to Level 3 via CUID2 assignment), then Constraint Enforcement (enumeration lists, format validation, null handling), then Schema Integrity (versioned, enforced at write time).