Remove Duplicate Staging Customer Records

Task

You are given a staging table in an Infosys data pipeline that may contain duplicate customer records, and the table does not have a primary key. Write a PostgreSQL query to identify duplicate rows based on the business columns customer_name, email, city, and signup_date, keep only one copy of each duplicate set, and remove the extras. Your result should show which rows would be deleted before performing the delete.

Schema

column	type	description
customer_name	VARCHAR(100)	Customer full name
email	VARCHAR(150)	Customer email address
city	VARCHAR(100)	Customer city
signup_date	DATE	Date the customer signed up
source_system	VARCHAR(50)	Source feed that loaded the row

Sample data

customer_name	email	city	signup_date	source_system
Priya Nair	priya.nair@example.com	Bengaluru	2024-01-10	CRM
Priya Nair	priya.nair@example.com	Bengaluru	2024-01-10	CRM
Arjun Mehta	arjun.mehta@example.com	Pune	2024-02-05	Web
Neha Shah	neha.shah@example.com	Mumbai	2024-03-12	Mobile
Neha Shah	neha.shah@example.com	Mumbai	2024-03-12	Mobile

Expected output

customer_name	email	city	signup_date	source_system	duplicate_rank
Priya Nair	priya.nair@example.com	Bengaluru	2024-01-10	CRM	2
Neha Shah	neha.shah@example.com	Mumbai	2024-03-12	Mobile	2

Problem

Task

Schema

column	type	description
customer_name	VARCHAR(100)	Customer full name
email	VARCHAR(150)	Customer email address
city	VARCHAR(100)	Customer city
signup_date	DATE	Date the customer signed up
source_system	VARCHAR(50)	Source feed that loaded the row

Sample data

customer_name	email	city	signup_date	source_system
Priya Nair	priya.nair@example.com	Bengaluru	2024-01-10	CRM
Priya Nair	priya.nair@example.com	Bengaluru	2024-01-10	CRM
Arjun Mehta	arjun.mehta@example.com	Pune	2024-02-05	Web
Neha Shah	neha.shah@example.com	Mumbai	2024-03-12	Mobile
Neha Shah	neha.shah@example.com	Mumbai	2024-03-12	Mobile

Expected output

customer_name	email	city	signup_date	source_system	duplicate_rank
Priya Nair	priya.nair@example.com	Bengaluru	2024-01-10	CRM	2
Neha Shah	neha.shah@example.com	Mumbai	2024-03-12	Mobile	2

Problem

Task

Schema

column	type	description
customer_name	VARCHAR(100)	Customer full name
email	VARCHAR(150)	Customer email address
city	VARCHAR(100)	Customer city
signup_date	DATE	Date the customer signed up
source_system	VARCHAR(50)	Source feed that loaded the row

Sample data

customer_name	email	city	signup_date	source_system
Priya Nair	priya.nair@example.com	Bengaluru	2024-01-10	CRM
Priya Nair	priya.nair@example.com	Bengaluru	2024-01-10	CRM
Arjun Mehta	arjun.mehta@example.com	Pune	2024-02-05	Web
Neha Shah	neha.shah@example.com	Mumbai	2024-03-12	Mobile
Neha Shah	neha.shah@example.com	Mumbai	2024-03-12	Mobile

Expected output

customer_name	email	city	signup_date	source_system	duplicate_rank
Priya Nair	priya.nair@example.com	Bengaluru	2024-01-10	CRM	2
Neha Shah	neha.shah@example.com	Mumbai	2024-03-12	Mobile	2

Problem

Task

Schema

column	type	description
customer_name	VARCHAR(100)	Customer full name
email	VARCHAR(150)	Customer email address
city	VARCHAR(100)	Customer city
signup_date	DATE	Date the customer signed up
source_system	VARCHAR(50)	Source feed that loaded the row

Sample data

customer_name	email	city	signup_date	source_system
Priya Nair	priya.nair@example.com	Bengaluru	2024-01-10	CRM
Priya Nair	priya.nair@example.com	Bengaluru	2024-01-10	CRM
Arjun Mehta	arjun.mehta@example.com	Pune	2024-02-05	Web
Neha Shah	neha.shah@example.com	Mumbai	2024-03-12	Mobile
Neha Shah	neha.shah@example.com	Mumbai	2024-03-12	Mobile

Expected output

customer_name	email	city	signup_date	source_system	duplicate_rank
Priya Nair	priya.nair@example.com	Bengaluru	2024-01-10	CRM	2
Neha Shah	neha.shah@example.com	Mumbai	2024-03-12	Mobile	2

Interview Guides

Problem

Task

Schema

Sample data

Expected output

Problem

Task

Schema

Sample data

Expected output

Remove Duplicate Staging Customer Records

Problem

Task

Schema

Sample data

Expected output

Problem

Task

Schema

Sample data

Expected output