Merge Customer Keys with Normalization

Task

You are given two large customer datasets whose primary key values use different formatting conventions. Write a PostgreSQL query that normalizes the keys, merges the records, and returns the matched customer rows with a single canonical key. Treat keys as matching after removing spaces, hyphens, and underscores, and after converting to uppercase. Use a LEFT JOIN so every row from the source customer table is preserved, even when there is no match in the reference table.

Schema

table	column	type	description
source_customers	source_customer_id	VARCHAR(20)	Primary key in the source system, inconsistently formatted
source_customers	full_name	VARCHAR(100)	Customer name from the source system
source_customers	email	VARCHAR(255)	Customer email address
source_customers	status	VARCHAR(20)	Source customer status
reference_customers	reference_customer_id	VARCHAR(20)	Primary key in the reference system, inconsistently formatted
reference_customers	canonical_customer_id	VARCHAR(20)	Standardized customer identifier
reference_customers	account_tier	VARCHAR(20)	Customer tier in the reference system
reference_customers	region	VARCHAR(50)	Customer region

Sample data

source_customer_id	full_name	email	status
`ab-1001`	Ava Chen	ava@example.com	active
`AB_1002`	Ben Ortiz	ben@example.com	active
`ab 1003`	Cara Singh	cara@example.com	inactive
`ZZ-9999`	Dana Fox	dana@example.com	active

reference_customer_id	canonical_customer_id	account_tier	region
`AB1001`	CUST-001	gold	west
`ab1002`	CUST-002	silver	east
`AB-1004`	CUST-004	bronze	south
`AA-0001`	CUST-900	gold	north

Expected output

canonical_customer_id	source_customer_id	reference_customer_id	full_name	email	account_tier	region
CUST-001	ab-1001	AB1001	Ava Chen	ava@example.com	gold	west
CUST-002	AB_1002	ab1002	Ben Ortiz	ben@example.com	silver	east
null	ab 1003	null	Cara Singh	cara@example.com	null	null
null	ZZ-9999	null	Dana Fox	dana@example.com	null	null

Problem

Task

Schema

table	column	type	description
source_customers	source_customer_id	VARCHAR(20)	Primary key in the source system, inconsistently formatted
source_customers	full_name	VARCHAR(100)	Customer name from the source system
source_customers	email	VARCHAR(255)	Customer email address
source_customers	status	VARCHAR(20)	Source customer status
reference_customers	reference_customer_id	VARCHAR(20)	Primary key in the reference system, inconsistently formatted
reference_customers	canonical_customer_id	VARCHAR(20)	Standardized customer identifier
reference_customers	account_tier	VARCHAR(20)	Customer tier in the reference system
reference_customers	region	VARCHAR(50)	Customer region

Sample data

source_customer_id	full_name	email	status
`ab-1001`	Ava Chen	ava@example.com	active
`AB_1002`	Ben Ortiz	ben@example.com	active
`ab 1003`	Cara Singh	cara@example.com	inactive
`ZZ-9999`	Dana Fox	dana@example.com	active

reference_customer_id	canonical_customer_id	account_tier	region
`AB1001`	CUST-001	gold	west
`ab1002`	CUST-002	silver	east
`AB-1004`	CUST-004	bronze	south
`AA-0001`	CUST-900	gold	north

Expected output

canonical_customer_id	source_customer_id	reference_customer_id	full_name	email	account_tier	region
CUST-001	ab-1001	AB1001	Ava Chen	ava@example.com	gold	west
CUST-002	AB_1002	ab1002	Ben Ortiz	ben@example.com	silver	east
null	ab 1003	null	Cara Singh	cara@example.com	null	null
null	ZZ-9999	null	Dana Fox	dana@example.com	null	null

Problem

Task

Schema

table	column	type	description
source_customers	source_customer_id	VARCHAR(20)	Primary key in the source system, inconsistently formatted
source_customers	full_name	VARCHAR(100)	Customer name from the source system
source_customers	email	VARCHAR(255)	Customer email address
source_customers	status	VARCHAR(20)	Source customer status
reference_customers	reference_customer_id	VARCHAR(20)	Primary key in the reference system, inconsistently formatted
reference_customers	canonical_customer_id	VARCHAR(20)	Standardized customer identifier
reference_customers	account_tier	VARCHAR(20)	Customer tier in the reference system
reference_customers	region	VARCHAR(50)	Customer region

Sample data

source_customer_id	full_name	email	status
`ab-1001`	Ava Chen	ava@example.com	active
`AB_1002`	Ben Ortiz	ben@example.com	active
`ab 1003`	Cara Singh	cara@example.com	inactive
`ZZ-9999`	Dana Fox	dana@example.com	active

reference_customer_id	canonical_customer_id	account_tier	region
`AB1001`	CUST-001	gold	west
`ab1002`	CUST-002	silver	east
`AB-1004`	CUST-004	bronze	south
`AA-0001`	CUST-900	gold	north

Expected output

canonical_customer_id	source_customer_id	reference_customer_id	full_name	email	account_tier	region
CUST-001	ab-1001	AB1001	Ava Chen	ava@example.com	gold	west
CUST-002	AB_1002	ab1002	Ben Ortiz	ben@example.com	silver	east
null	ab 1003	null	Cara Singh	cara@example.com	null	null
null	ZZ-9999	null	Dana Fox	dana@example.com	null	null

Problem

Task

Schema

table	column	type	description
source_customers	source_customer_id	VARCHAR(20)	Primary key in the source system, inconsistently formatted
source_customers	full_name	VARCHAR(100)	Customer name from the source system
source_customers	email	VARCHAR(255)	Customer email address
source_customers	status	VARCHAR(20)	Source customer status
reference_customers	reference_customer_id	VARCHAR(20)	Primary key in the reference system, inconsistently formatted
reference_customers	canonical_customer_id	VARCHAR(20)	Standardized customer identifier
reference_customers	account_tier	VARCHAR(20)	Customer tier in the reference system
reference_customers	region	VARCHAR(50)	Customer region

Sample data

source_customer_id	full_name	email	status
`ab-1001`	Ava Chen	ava@example.com	active
`AB_1002`	Ben Ortiz	ben@example.com	active
`ab 1003`	Cara Singh	cara@example.com	inactive
`ZZ-9999`	Dana Fox	dana@example.com	active

reference_customer_id	canonical_customer_id	account_tier	region
`AB1001`	CUST-001	gold	west
`ab1002`	CUST-002	silver	east
`AB-1004`	CUST-004	bronze	south
`AA-0001`	CUST-900	gold	north

Expected output

canonical_customer_id	source_customer_id	reference_customer_id	full_name	email	account_tier	region
CUST-001	ab-1001	AB1001	Ava Chen	ava@example.com	gold	west
CUST-002	AB_1002	ab1002	Ben Ortiz	ben@example.com	silver	east
null	ab 1003	null	Cara Singh	cara@example.com	null	null
null	ZZ-9999	null	Dana Fox	dana@example.com	null	null

Interview Guides

Problem

Task

Schema

Sample data

Expected output

Problem

Task

Schema

Sample data

Expected output

Merge Customer Keys with Normalization

Problem

Task

Schema

Sample data

Expected output

Problem

Task

Schema

Sample data

Expected output