Task
You are given two large customer datasets whose primary key values use different formatting conventions. Write a PostgreSQL query that normalizes the keys, merges the records, and returns the matched customer rows with a single canonical key. Treat keys as matching after removing spaces, hyphens, and underscores, and after converting to uppercase. Use a LEFT JOIN so every row from the source customer table is preserved, even when there is no match in the reference table.
Schema
| table | column | type | description |
|---|
| source_customers | source_customer_id | VARCHAR(20) | Primary key in the source system, inconsistently formatted |
| source_customers | full_name | VARCHAR(100) | Customer name from the source system |
| source_customers | email | VARCHAR(255) | Customer email address |
| source_customers | status | VARCHAR(20) | Source customer status |
| reference_customers | reference_customer_id | VARCHAR(20) | Primary key in the reference system, inconsistently formatted |
| reference_customers | canonical_customer_id | VARCHAR(20) | Standardized customer identifier |
| reference_customers | account_tier | VARCHAR(20) | Customer tier in the reference system |
| reference_customers | region | VARCHAR(50) | Customer region |
Sample data
| reference_customer_id | canonical_customer_id | account_tier | region |
|---|
AB1001 | CUST-001 | gold | west |
ab1002 | CUST-002 | silver | east |
AB-1004 | CUST-004 | bronze | south |
AA-0001 | CUST-900 | gold | north |
Expected output