
You’re a data engineer at a fast-growing e-commerce retailer (~3M monthly active users) that sells direct-to-consumer. Marketing spends heavily on paid search and email, but the company can’t reliably connect anonymous web behavior (Google Analytics-style events) to known customers (CRM) once they purchase or log in. This leads to mis-attributed CAC/LTV, broken retargeting audiences, and inconsistent executive reporting.
Your warehouse receives:
ga_client_id (cookie/device identifier).ga_client_id to customer_id when a user logs in or completes checkout.Because users can browse anonymously on multiple devices, a single customer may map to multiple ga_client_ids over time. You want a single customer view that rolls up web behavior and revenue.
Write a SQL query that produces a unified customer-level dataset for January 2025.
customer_id.customer_id, emailfirst_touch_channel: the channel of the customer’s earliest known web event (across all mapped GA client IDs)sessions_jan_2025: distinct session count in Jan 2025pageviews_jan_2025: total page_view events in Jan 2025orders_jan_2025: number of orders placed in Jan 2025revenue_jan_2025: sum of order totals in Jan 2025event_ts; if ties, choose lexicographically smallest channel.crm_customers| column | type | description |
|---|---|---|
| customer_id | INT | CRM customer primary key |
| VARCHAR(255) | Customer email | |
| created_at | TIMESTAMP | When the CRM record was created |
crm_orders| column | type | description |
|---|---|---|
| order_id | BIGINT | Order primary key |
| customer_id | INT | FK to crm_customers |
| order_ts | TIMESTAMP | Order timestamp |
| order_total | DECIMAL(10,2) | Total charged amount |
ga_events| column | type | description |
|---|---|---|
| event_id | BIGINT | Event primary key |
| ga_client_id | VARCHAR(64) | GA client/cookie identifier |
| session_id | VARCHAR(64) | Session identifier |
| event_ts | TIMESTAMP | Event timestamp |
| event_name | VARCHAR(50) | e.g., page_view, purchase |
| channel | VARCHAR(50) | e.g., Paid Search, Email, Organic Search |
identity_map| column | type | description |
|---|---|---|
| ga_client_id | VARCHAR(64) | GA client identifier |
| customer_id | INT | FK to crm_customers |
| first_seen_ts | TIMESTAMP | First time we observed this mapping |
crm_customers| customer_id | created_at | |
|---|---|---|
| 101 | ava.nguyen@example.com | 2024-11-20 09:10:00 |
| 102 | ben.carter@example.com | 2025-01-03 12:00:00 |
| 103 | chloe.patel@example.com | 2025-01-28 08:30:00 |
crm_orders| order_id | customer_id | order_ts | order_total |
|---|---|---|---|
| 9001 | 101 | 2025-01-05 10:15:00 | 120.00 |
| 9002 | 101 | 2025-02-02 14:20:00 | 80.00 |
| 9003 | 103 | 2025-01-30 16:40:00 | 45.50 |
ga_events| event_id | ga_client_id | session_id | event_ts | event_name | channel |
|---|---|---|---|---|---|
| 5001 | GA1 | S1 | 2024-12-29 18:00:00 | page_view | Paid Search |
| 5002 | GA1 | S2 | 2025-01-04 09:00:00 | page_view | Paid Search |
| 5003 | GA2 | S3 | 2025-01-06 11:00:00 | page_view | |
| 5004 | GA3 | S4 | 2025-01-10 13:00:00 | page_view | Organic Search |
| 5005 | GA3 | S4 | 2025-01-10 13:05:00 | page_view | Organic Search |
identity_map| ga_client_id | customer_id | first_seen_ts |
|---|---|---|
| GA1 | 101 | 2024-12-29 18:05:00 |
| GA2 | 101 | 2025-01-06 11:05:00 |
| GA3 | 102 | 2025-01-10 13:10:00 |
| customer_id | first_touch_channel | sessions_jan_2025 | pageviews_jan_2025 | orders_jan_2025 | revenue_jan_2025 | |
|---|---|---|---|---|---|---|
| 101 | ava.nguyen@example.com | Paid Search | 2 | 2 | 1 | 120.00 |
| 102 | ben.carter@example.com | Organic Search | 1 | 2 | 0 | 0.00 |
| 103 | chloe.patel@example.com | NULL | 0 | 0 | 1 | 45.50 |
| Column | Type | Description |
|---|---|---|
| customer_idPK | INT | CRM customer primary key |
| VARCHAR(255) | Customer email address | |
| created_at | TIMESTAMP | Timestamp when the customer record was created |
| Column | Type | Description |
|---|---|---|
| order_idPK | BIGINT | Order primary key |
| customer_id | INT | Customer placing the order (FK to crm_customers) |
| order_ts | TIMESTAMP | Order timestamp |
| order_total | DECIMAL(10,2) | Total charged amount for the order |
| Column | Type | Description |
|---|---|---|
| event_idPK | BIGINT | Event primary key |
| ga_client_id | VARCHAR(64) | GA client/cookie identifier |
| session_id | VARCHAR(64) | Session identifier |
| event_ts | TIMESTAMP | Event timestamp |
| event_name | VARCHAR(50) | Event type (e.g., page_view) |
| channel | VARCHAR(50) | Marketing channel attribution for the event |
| Column | Type | Description |
|---|---|---|
| ga_client_idPK | VARCHAR(64) | GA client identifier (one side of the crosswalk) |
| customer_id | INT | Mapped CRM customer_id |
| first_seen_ts | TIMESTAMP | First time this mapping was observed |
{"ga_events":[[5001,"GA1","S1","2024-12-29 18:00:00","page_view","Paid Search"],[5002,"GA1","S2","2025-01-04 09:00:00","page_view","Paid Search"],[5003,"GA2","S3","2025-01-06 11:00:00","page_view","Email"],[5004,"GA3","S4","2025-01-10 13:00:00","page_view","Organic Search"],[5005,"GA3","S4","2025-01-10 13:05:00","page_view","Organic Search"]],"crm_orders":[[9001,101,"2025-01-05 10:15:00",120],[9002,101,"2025-02-02 14:20:00",80],[9003,103,"2025-01-30 16:40:00",45.5]],"identity_map":[["GA1",101,"2Output[[101,"ava.nguyen@example.com","Paid Search",2,2,1,120],[102,"ben.carter@example.com","Organic Search",1,2,0,0],[103,"chloe.patel@example.com",null,0,0,1,45.5]]