You’re a data engineer at a fintech payments company that processes ~10M card transactions/day. A legacy customer profile system is being migrated into a new analytics warehouse used for credit risk monitoring and regulatory reporting. Before enabling the production cutover, the team runs a mock load (one-day snapshot) from the legacy system into the new dim_customers table.
Because downstream models and compliance dashboards depend on accurate customer identity and status fields, you must validate that the conversion logic (type casting, normalization, and status mapping) produced correct results. The goal is to produce a single SQL query that can be run after each mock load and reviewed by engineers and QA.
Write a SQL query that compares the legacy snapshot (legacy_customers_snapshot) to the newly loaded warehouse dimension (dim_customers) for a given load_date, and outputs reconciliation metrics and mismatch counts.
For load_date = '2026-02-15', your query must return one row with:
load_datelegacy_row_count and dim_row_countmissing_in_dim_count (present in legacy, absent in dim)extra_in_dim_count (present in dim, absent in legacy)email_mismatch_count (case-insensitive compare after trimming)status_mismatch_count (legacy status_code mapped to dim status)total_credit_limit_legacy and total_credit_limit_dim (sum comparison)Status mapping rules (conversion spec):
status_code = 'A' → dim status = 'active'status_code = 'S' → dim status = 'suspended'status_code = 'C' → dim status = 'closed'legacy_customers_snapshot| Column | Type | Description |
|---|---|---|
| customer_id | BIGINT | Stable customer identifier from legacy system |
| VARCHAR(255) | Customer email (may contain mixed case / whitespace) | |
| status_code | CHAR(1) | Legacy status code: A/S/C |
| credit_limit_cents | BIGINT | Credit limit in cents |
| snapshot_date | DATE | Date of the legacy snapshot |
dim_customers| Column | Type | Description |
|---|---|---|
| customer_id | BIGINT | Customer identifier (should match legacy) |
| VARCHAR(255) | Normalized email stored in warehouse | |
| status | VARCHAR(20) | Converted status: active/suspended/closed |
| credit_limit_usd | DECIMAL(12,2) | Credit limit in USD |
| load_date | DATE | Date the record was loaded into the warehouse |
legacy_customers_snapshot| customer_id | status_code | credit_limit_cents | snapshot_date | |
|---|---|---|---|---|
| 101 | Ava@Example.com | A | 500000 | 2026-02-15 |
| 102 | ben@example.com | S | 250000 | 2026-02-15 |
| 103 | chris@example.com | C | 0 | 2026-02-15 |
| 104 | dana@example.com | A | 100000 | 2026-02-15 |
dim_customers| customer_id | status | credit_limit_usd | load_date | |
|---|---|---|---|---|
| 101 | ava@example.com | active | 5000.00 | 2026-02-15 |
| 102 | ben@example.com | active | 2500.00 | 2026-02-15 |
| 103 | chris@example.com | closed | 0.00 | 2026-02-15 |
| 105 | erin@example.com | active | 1200.00 | 2026-02-15 |
| load_date | legacy_row_count | dim_row_count | missing_in_dim_count | extra_in_dim_count | email_mismatch_count | status_mismatch_count | total_credit_limit_legacy | total_credit_limit_dim |
|---|---|---|---|---|---|---|---|---|
| 2026-02-15 | 4 | 4 | 1 | 1 | 0 | 1 | 8500.00 | 8700.00 |