Task
StreamWave wants to compare monthly retention across signup cohorts. Write a PostgreSQL query that calculates retention by cohort month over time.
Requirements
- Define each user's cohort as the month of their
signup_date.
- For each cohort, count how many distinct users were active in month 0, month 1, and month 2 based on
activity_date.
- Return the cohort size, retained users, and retention rate for each cohort month and month offset.
- Exclude activity that occurred before a user's signup date, and only include month offsets from 0 to 2.
Table Definitions
users
| column | type | description |
|---|
| user_id | INT | Unique user identifier |
| signup_date | DATE | Date the user signed up |
| acquisition_channel | VARCHAR(50) | Marketing channel for signup |
user_activity
| column | type | description |
|---|
| activity_id | INT | Unique activity event identifier |
| user_id | INT | User tied to the activity event |
| activity_date | DATE | Date of the activity |
| activity_type | VARCHAR(50) | Type of product activity |
Sample Data
users
| user_id | signup_date | acquisition_channel |
|---|
| 3 | 2024-02-20 | Referral |
| 1 | 2024-01-05 | Paid Search |
| 8 | 2024-03-18 | Organic |
| 5 | 2024-03-01 | Paid Search |
user_activity
| activity_id | user_id | activity_date | activity_type |
|---|
| 104 | 1 | 2024-02-10 | stream |
| 101 | 1 | 2024-01-06 | login |
| 111 | 3 | 2024-03-05 | login |
| 120 | 8 | 2024-04-02 | stream |
Expected Output
| cohort_month | month_number | cohort_size | retained_users | retention_rate |
|---|
| 2024-01 | 0 | 2 | 2 | 1.0000 |
| 2024-01 | 1 | 2 | 2 | 1.0000 |
| 2024-01 | 2 | 2 | 1 | 0.5000 |
| 2024-02 | 0 | 3 | 2 | 0.6667 |
| 2024-02 | 1 | 3 | 2 | 0.6667 |
| 2024-02 | 2 | 3 | 1 | 0.3333 |
| 2024-03 | 3 | 2 | 2 | 1.0000 |
Use distinct users for retention counts so multiple activity events in the same month do not inflate results.