Task
You are given messy event logs from Quora surfaces and asked to prepare a clean daily dataset for analysis. Write a PostgreSQL query that returns, for each event date and Quora surface, the number of valid events, the number of unique valid users, and the number of valid question views. Treat event names case-insensitively, trim extra whitespace from text fields, map question_view, Question View, and view_question to question_view, and map answer_click and Answer Click to answer_click. Exclude rows where the event timestamp is null, the user is inactive, the user does not exist, or the surface is blank/null after trimming.
Use the surface_map table to standardize surface names, keeping only rows that successfully map to a canonical Quora surface.
Schema
raw_events
| column | type | description |
|---|
| event_id | INT | Event identifier |
| user_id | INT | User who generated the event |
| event_name | VARCHAR(100) | Raw event name |
| event_ts | TIMESTAMP | Event timestamp |
| surface_raw | VARCHAR(100) | Raw surface name |
| | |
users
| column | type | description |
|---|
| user_id | INT | User identifier |
| is_active | BOOLEAN | Whether the user is active |
| | |
surface_map
| column | type | description |
|---|
| surface_raw | VARCHAR(100) | Raw surface label |
| canonical_surface | VARCHAR(100) | Standardized Quora surface |
| | |
Sample data
raw_events
| event_id | user_id | event_name | event_ts | surface_raw |
|---|
| 101 | 1 | question_view | 2024-04-01 09:00:00 | Home Feed |
| 102 | 1 | Answer Click | 2024-04-01 09:02:00 | home feed |
| 103 | 2 | view_question | 2024-04-01 11:00:00 | Question Page |
| 104 | 3 | question_view | 2024-04-01 12:00:00 | Spaces |
Expected output
| event_date | canonical_surface | valid_events | unique_valid_users | valid_question_views |
|---|
| 2024-04-01 | Home Feed | 2 | 1 | 1 |
| 2024-04-01 | Question Page | 1 | 1 | 1 |
| 2024-04-02 | Home Feed | 2 | 2 | 1 |