Task
Meta's lead ingestion pipeline for Facebook Lead Ads occasionally creates duplicate submission records for the same lead and form. Write a SQL query to identify the canonical record to keep for each duplicated lead and return only the duplicate rows that should be removed.
Requirements
- Treat records as duplicates when they share the same
lead_id and form_id.
- Preserve the earliest valid submission by ranking rows using
submitted_at ascending, then ingested_at ascending, then submission_id ascending.
- Return only rows that should be deleted, along with campaign and advertiser context.
- Exclude rows tied to inactive campaigns so cleanup only applies to active data.
Table Definitions
lead_submissions
| column | type | description |
|---|
| submission_id | INT | Unique submission row ID |
| lead_id | VARCHAR(20) | Lead identifier from Meta lead capture |
| form_id | VARCHAR(20) | Lead form identifier |
| campaign_id | INT | Associated ad campaign |
| submitted_at | TIMESTAMP | Time the lead was submitted |
| ingested_at | TIMESTAMP | Time the row landed in the warehouse |
| email | VARCHAR(100) | Lead email address |
| phone | VARCHAR(20) | Lead phone number |
| source_surface | VARCHAR(30) | Meta surface where the lead originated |
campaigns
| column | type | description |
|---|
| campaign_id | INT | Campaign identifier |
| advertiser_id | INT | Owning advertiser |
| campaign_name | VARCHAR(100) | Campaign name |
| status | VARCHAR(20) | Campaign status |
| start_date | DATE | Campaign start date |
advertisers
| column | type | description |
|---|
| advertiser_id | INT | Advertiser identifier |
| advertiser_name | VARCHAR(100) | Advertiser name |
| vertical | VARCHAR(50) | Advertiser industry |
| region | VARCHAR(30) | Primary region |
Sample Data
Representative rows are shown in the provided dataset.
Expected Output
| submission_id | lead_id | form_id | campaign_name | advertiser_name | submitted_at | ingested_at | duplicate_rank |
|---|
| 102 | L1001 | F201 | Q1 SMB Lead Gen | Acme Fitness | 2024-03-01 09:00:00 | 2024-03-01 09:02:00 | 2 |
| 104 | L1002 | F201 | Q1 SMB Lead Gen | Acme Fitness | 2024-03-02 10:00:00 | 2024-03-02 10:06:00 | 2 |
| 109 | L1006 | F204 | Reengagement Retargeting | Nova Education | 2024-03-05 14:00:00 | 2024-03-05 14:05:00 | 2 |