

A

Cohort retention is a common product analytics pattern used to measure how well users return after signup or first activity. Interviewers ask this to evaluate whether you can translate a business metric into a clean SQL workflow.
Explain how you would structure a SQL analysis to measure retention by cohort. Your answer should cover:
Focus on the SQL design and reasoning rather than a single exact query. The interviewer expects a practical explanation of the steps, common table structure, and the main pitfalls that affect correctness.
The first step is assigning each user to a cohort, usually based on their signup date or first qualifying event. This creates a stable reference point for comparing later activity across users who started in the same period.
SELECT user_id, DATE_TRUNC('month', MIN(event_date)) AS cohort_month
FROM user_events
GROUP BY user_id;
Retention depends on a clear definition of what counts as returning activity, such as any session, purchase, or app open in a later week or month. The SQL must distinguish the cohort event from subsequent qualifying activity.
CASE WHEN event_type IN ('session', 'purchase') THEN 1 ELSE 0 END
Retention is measured relative to the cohort start, so each activity row needs a period number like month 0, month 1, or week 2. This lets you compare users across cohorts on the same lifecycle timeline instead of calendar dates.
((EXTRACT(YEAR FROM activity_month) - EXTRACT(YEAR FROM cohort_month)) * 12 + (EXTRACT(MONTH FROM activity_month) - EXTRACT(MONTH FROM cohort_month))) AS month_number
Users often generate multiple events in the same period, so retention should usually count distinct users, not raw events. Without deduplication, retention rates can be inflated and inconsistent across cohorts.
COUNT(DISTINCT user_id) AS retained_users
After counting retained users by cohort and period, divide by the cohort size to get the retention rate. This normalization is what makes cohorts comparable even when they have different numbers of users.
retained_users::numeric / NULLIF(cohort_size, 0) AS retention_rate