Task
You are given a staging table in an Infosys data pipeline that may contain duplicate customer records, and the table does not have a primary key. Write a PostgreSQL query to identify duplicate rows based on the business columns customer_name, email, city, and signup_date, keep only one copy of each duplicate set, and remove the extras. Your result should show which rows would be deleted before performing the delete.
Schema
| column | type | description |
|---|
| customer_name | VARCHAR(100) | Customer full name |
| email | VARCHAR(150) | Customer email address |
| city | VARCHAR(100) | Customer city |
| signup_date | DATE | Date the customer signed up |
| source_system | VARCHAR(50) | Source feed that loaded the row |
Sample data
Expected output