Module 6 Lab: Sample Data

Synthetic data for the Module 6 hands-on installation lab. All identifiers, names, phone numbers, and email addresses are fictional. Any resemblance to real persons or businesses is coincidental.

Files

  • sample_csv/clients.csv — 20 client records simulating an Atlas Legal-style intake. Deliberately includes data quality problems so the introspection run produces interesting anomaly flags.
  • sample_csv/matters.csv — 20 matter records linked to clients via client_id.

Intentional data quality issues

The lab data is engineered to demonstrate the SDC Agents SMB anomaly detection. When you run introspection on clients.csv you should see flags including:

  • near_duplicate_identifier on client_name — Maria Gonzalez appears as Maria Gonzalez, Maria Gonzales, and Maria E. Gonzalez (rows 1001, 1002, 1010). James O'Brien appears as James O'Brien, J. O'Brien, and James O Brien (rows 1003, 1008, 1018). Chen Wei appears as Chen Wei (twice) and Wei Chen with email and phone matching across all three (rows 1005, 1006, 1014). Sandra Williams appears as both Sandra Williams and Sandra Williams-Hayes with the same email (rows 1015, 1016). Acme Holdings, Riverdale Cafe, and Patel Family Trust each appear in two slightly different forms.

  • format_drift on phone — formats include (503) 555-0142, 503-555-0142, 5035550199, 503.555.0156, and (503)555-0156. Five different formats coexist.

  • format_drift on case_number — formats include IM-2023-0001, SB2023-004, SB-2023-011, and IM2023-012. Two competing conventions.

  • unparseable_dates on intake_date — row 1005 has 00/00/0000.

  • mixed_types / null_count on phone — row 1007 has an empty phone field.

  • outlier_count on billable_hours in matters.csv — matter M-0018 has 9999 billable hours, simulating a fat-finger error.

These flags are direct evidence for a Maturity Map dimension scoring exercise. Trainees who run the lab on this data should be able to identify which dimension each flag supports without referring back to Module 2.

Lab procedure

  1. Install SDCforSMB on your laptop following Module 6 section 6.2
  2. Complete the onboarding wizard (use the Axius SDC training SDCStudio wallet credentials provided in the certification portal)
  3. Add a CSV datasource pointing at this sample_csv/ directory
  4. Run introspection
  5. Capture screenshots:
  6. Wizard completion screen
  7. Introspection result table showing the anomaly flags above
  8. Assembly review screen showing reuse vs mint counts
  9. Deployed application health endpoint after Generate Application
  10. Submit screenshots via the certification portal

What to look for in the assembly review

The first run on this data will propose components for: a Person (covering both clients and attorneys), a PhoneNumber, an EmailAddress, an IdentifierCode (covering case_number variations), a MoneyAmount (for total_billed_usd), and a DurationHours (for billable_hours). Of these, Person, PhoneNumber, and EmailAddress are very likely to match existing components in the Axius SDC reference catalog and be marked as reuse (free). IdentifierCode and the domain-specific types may require minting (billable). The expected wallet impact on the training wallet is $0 — the training wallet covers all minting for certified practitioners in good standing.