Task
NovaCart stores customer signups in a raw intake table and sometimes receives duplicate records for the same person. Write a SQL query to identify duplicates and mark which rows should be kept versus flagged for review.
Requirements
- Treat records as duplicates when they have the same
email_address and signup_date.
- Keep the row with the earliest
created_at; if there is a tie, keep the smallest customer_id.
- Return only duplicate groups, including both the kept row and the rows that should be flagged.
- Output the duplicate group key, row rank within the group, and a
record_status column with values keep or flag_duplicate.
Table Definitions
customer_signups
| Column | Type | Description |
|---|
| customer_id | INT | Unique row identifier |
| full_name | VARCHAR(100) | Customer name as received from source systems |
| email_address | VARCHAR(255) | Customer email; may be NULL |
| signup_date | DATE | Date the signup is attributed to |
| source_system | VARCHAR(50) | Source that created the record |
| created_at | TIMESTAMP | Time the row was inserted |
| is_active | BOOLEAN | Whether the signup is currently active |
Sample Data
| customer_id | full_name | email_address | signup_date | source_system | created_at | is_active |
|---|
| 105 | Ava Reed | ava@example.com | 2024-02-03 | mobile_app | 2024-02-03 09:00:00 | true |
| 101 | Ava Reed | ava@example.com | 2024-02-03 | web | 2024-02-03 08:15:00 | true |
| 108 | Noah Kim | noah@example.com | 2024-02-05 | partner_api | 2024-02-05 12:00:00 | false |
| 103 | Noah Kim | noah@example.com | 2024-02-05 | web | 2024-02-05 10:30:00 | true |
| 110 | Mia Chen | mia@example.com | 2024-02-07 | web | 2024-02-07 11:00:00 | true |
Expected Output