Business Context
You’re a data engineer at a fast-growing e-commerce retailer (~3M monthly active users) that sells direct-to-consumer. Marketing spends heavily on paid search and email, but the company can’t reliably connect anonymous web behavior (Google Analytics-style events) to known customers (CRM) once they purchase or log in. This leads to mis-attributed CAC/LTV, broken retargeting audiences, and inconsistent executive reporting.
Your warehouse receives:
- CRM data with customer records and orders.
- Web analytics events (GA-like) keyed by
ga_client_id (cookie/device identifier).
- A crosswalk table mapping
ga_client_id to customer_id when a user logs in or completes checkout.
Because users can browse anonymously on multiple devices, a single customer may map to multiple ga_client_ids over time. You want a single customer view that rolls up web behavior and revenue.
Task
Write a SQL query that produces a unified customer-level dataset for January 2025.
Requirements
- Output one row per
customer_id.
- Include:
customer_id, email
first_touch_channel: the channel of the customer’s earliest known web event (across all mapped GA client IDs)
sessions_jan_2025: distinct session count in Jan 2025
pageviews_jan_2025: total page_view events in Jan 2025
orders_jan_2025: number of orders placed in Jan 2025
revenue_jan_2025: sum of order totals in Jan 2025
- Only count web events that can be attributed to a customer via the mapping table.
- If a customer has no January web events, they should still appear if they have a January order (and web metrics should be 0 / NULL as appropriate).
- Use a deterministic rule for first touch: earliest
event_ts; if ties, choose lexicographically smallest channel.
Table Definitions
crm_customers
| column | type | description |
|---|
| customer_id | INT | CRM customer primary key |
| email | VARCHAR(255) | Customer email |
| created_at | TIMESTAMP | When the CRM record was created |
crm_orders
| column | type | description |
|---|
| order_id | BIGINT | Order primary key |
| customer_id | INT | FK to crm_customers |
| order_ts | TIMESTAMP | Order timestamp |
| order_total | DECIMAL(10,2) | Total charged amount |
ga_events
| column | type | description |
|---|
| event_id | BIGINT | Event primary key |
| ga_client_id | VARCHAR(64) | GA client/cookie identifier |
| session_id | VARCHAR(64) | Session identifier |
| event_ts | TIMESTAMP | Event timestamp |
| event_name | VARCHAR(50) | e.g., page_view, purchase |
| channel | VARCHAR(50) | e.g., Paid Search, Email, Organic Search |
identity_map
| column | type | description |
|---|
| ga_client_id | VARCHAR(64) | GA client identifier |
| customer_id | INT | FK to crm_customers |
| first_seen_ts | TIMESTAMP | First time we observed this mapping |
Sample Data
crm_customers
crm_orders
| order_id | customer_id | order_ts | order_total |
|---|
| 9001 | 101 | 2025-01-05 10:15:00 | 120.00 |
| 9002 | 101 | 2025-02-02 14:20:00 | 80.00 |
| 9003 | 103 | 2025-01-30 16:40:00 | 45.50 |
ga_events
| event_id | ga_client_id | session_id | event_ts | event_name | channel |
|---|
| 5001 | GA1 | S1 | 2024-12-29 18:00:00 | page_view | Paid Search |
| 5002 | GA1 | S2 | 2025-01-04 09:00:00 | page_view | Paid Search |
| 5003 | GA2 | S3 | 2025-01-06 11:00:00 | page_view | Email |
| 5004 | GA3 | S4 | 2025-01-10 13:00:00 | page_view | Organic Search |
| 5005 | GA3 | S4 | 2025-01-10 13:05:00 | page_view | Organic Search |
identity_map
| ga_client_id | customer_id | first_seen_ts |
|---|
| GA1 | 101 | 2024-12-29 18:05:00 |
| GA2 | 101 | 2025-01-06 11:05:00 |
| GA3 | 102 | 2025-01-10 13:10:00 |
Expected Output