Module 1 Exercise - Reference Answer Key

Part 1: Semantic Ambiguity

Column/ValueAmbiguity
regionInconsistent naming: "Northeast" vs "NE" vs "northeast". "West" vs "West Coast" vs "Western". "Asia-Pacific" vs "APAC" vs "AP". A receiving system cannot resolve these without a mapping table.
statusMixed case: "active" vs "Active" vs "ACTIVE". Typo: "actve" (row 9). Non-standard values: "test", "demo", "system", "migrated", "VIP", "dev". No enumeration enforced.
phoneMultiple formats: "555-0101" vs "(555) 0102" vs "5550103" vs "+86-10-5550108". Some rows use "N/A" or "000-0000" as placeholders instead of proper null handling.
signup_dateMultiple date formats: "2024-01-15" vs "01/20/2024" vs "2024/04/15" vs "May 1 2024". A parser assuming ISO 8601 will fail on at least 8 rows.
name"Jane Smith" appears in rows 1, 3, 12, 29, 41 - same person? "Bob Johnson" vs "Robert Johnson" vs "Robert 'Bob' Johnson" - how many people? "ACME Corp" vs "acme corporation" - same company?
customer_idAuto-increment integers scoped to this export. Row 1's customer_id "1" means nothing in any other system. Level 1 Semantic Identity.

Part 2: Constraint Enforcement Failures

Row(s)Violation
Row 7Missing name (empty field).
Row 9Status typo: "actve" instead of "active". No enumeration constraint.
Row 13Test account in production data. Name "Test Account", status "test".
Row 19Name is "NULL" (the string). Signup date "1900-01-01" is a placeholder.
Row 28Test data: status "demo", balance 999999.99, region "Demo".
Row 40System placeholder: name "Placeholder Entry", date "1970-01-01" (Unix epoch).
Row 42Dev data: name "TESTING", status "dev", negative balance.

At least 11 rows would fail basic constraint enforcement. This is Level 1-2 Constraint Enforcement.

Part 3: Identity Instability

EntityRepresentationsRows
Jane SmithDifferent emails, same phone in some rows1, 3, 12, 29, 41
Bob/Robert Johnson"Bob Johnson", "Robert Johnson", "Bob Johnson Jr", "Robert 'Bob' Johnson"2, 4, 15, 30
ACME Corp"ACME Corp", "acme corporation", "Acme Corporation" with different emails5, 6, 22, 31
Patricia/Pat Williams"Patricia Williams" vs "pat williams"32, 33
Michael/Mike Brown"Michael Brown" vs "mike brown", same email36, 37
Global Solutions"Global Solutions LLC" vs "global solutions"16, 17

The customer_id column provides no help - each duplicate has a different ID. This is Level 1 Semantic Identity.

Part 4: Floor Constraint Application

DimensionWorking ScoreEvidence
Schema IntegrityLevel 2A schema exists (columns defined) but not enforced at write time - test data, placeholders, and inconsistent formats accepted.
Constraint EnforcementLevel 1No enumeration on status, no format enforcement on phone or date, no null handling policy.
Semantic IdentityLevel 1Auto-increment IDs scoped to one database. Duplicate entities have different IDs.

Floor Constraint

min(2, 1, 1) = Level 1. All derived dimensions (Provenance, Interoperability, Governance) are functionally capped at Level 1. You cannot trace the provenance of data you cannot identify. You cannot share data whose meaning is ambiguous. You cannot govern what you cannot constrain.

Recommended first intervention: Semantic Identity (Level 1 to Level 3 via CUID2 assignment), then Constraint Enforcement (enumeration lists, format validation, null handling), then Schema Integrity (versioned, enforced at write time).