Business Context
You’re a data engineer at a large e-commerce marketplace (~20M registered customers, millions of daily sessions). Multiple upstream systems (web signup, mobile app, call center) write to a shared customer profile store. Due to intermittent API timeouts and partial updates, the customer_profiles table often contains missing values (NULL or empty strings) for key fields like email and phone.
Marketing and fraud teams rely on these fields for order confirmations, account recovery, and risk checks. If missing data isn’t cleaned consistently, the business sees higher support costs, lower deliverability, and weaker fraud detection.
Task
Write a SQL query that produces a cleaned view of customer contact data by imputing missing values using the most recent non-missing value available for the same customer.
Requirements
- Return one row per
customer_id.
- For each customer, output:
customer_id
clean_email
clean_phone
last_updated_at (the latest profile update timestamp for that customer)
- Treat both NULL and empty strings ('') as missing.
- Impute missing values by taking the most recent non-missing value for that customer based on
updated_at.
- If a customer has never had a non-missing value for a field, return NULL for that cleaned field.
- Order results by
customer_id ascending.
Table Definitions
customer_profiles
| Column | Type | Description |
|---|
| profile_event_id | BIGINT | Primary key for each profile update event |
| customer_id | BIGINT | Customer identifier (many events per customer) |
| email | VARCHAR(255) | Email captured in this update (may be NULL/empty) |
| phone | VARCHAR(50) | Phone captured in this update (may be NULL/empty) |
| updated_at | TIMESTAMP | When this profile update event was written |
Sample Data
customer_profiles
| profile_event_id | customer_id | email | phone | updated_at |
|---|
| 9001 | 101 | | 415-555-0101 | 2025-01-03 10:05:00 |
| 9002 | 101 | ava.chen@example.com | | 2025-01-10 09:20:00 |
| 9003 | 102 | NULL | | 2025-01-02 08:00:00 |
| 9004 | 102 | | 212-555-0199 | 2025-01-05 12:30:00 |
| 9005 | 103 | mia.patel@example.com | 646-555-0144 | 2025-01-04 14:10:00 |
Expected Output