Structure Retention Cohort Analysis

Medium

SQL & Data Manipulation

Asked at 6 companies6AggregationsCTEsGroup ByDate Functions

Also asked at

Problem

Context

Cohort retention is a common product analytics pattern used to measure how well users return after signup or first activity. Interviewers ask this to evaluate whether you can translate a business metric into a clean SQL workflow.

Core question

Explain how you would structure a SQL analysis to measure retention by cohort. Your answer should cover:

How to define the cohort date for each user
How to identify retained users in later periods
How to calculate the period offset from the cohort start
How to aggregate results into a cohort retention table or matrix
How to handle edge cases such as multiple events in the same period, missing activity, and partial current periods

Scope guidance

Focus on the SQL design and reasoning rather than a single exact query. The interviewer expects a practical explanation of the steps, common table structure, and the main pitfalls that affect correctness.

Key Concepts

Cohort Assignment

The first step is assigning each user to a cohort, usually based on their signup date or first qualifying event. This creates a stable reference point for comparing later activity across users who started in the same period.

SELECT user_id, DATE_TRUNC('month', MIN(event_date)) AS cohort_month
FROM user_events
GROUP BY user_id;

Retention Event Definition

Retention depends on a clear definition of what counts as returning activity, such as any session, purchase, or app open in a later week or month. The SQL must distinguish the cohort event from subsequent qualifying activity.

CASE WHEN event_type IN ('session', 'purchase') THEN 1 ELSE 0 END

Period Offset Calculation

Retention is measured relative to the cohort start, so each activity row needs a period number like month 0, month 1, or week 2. This lets you compare users across cohorts on the same lifecycle timeline instead of calendar dates.

((EXTRACT(YEAR FROM activity_month) - EXTRACT(YEAR FROM cohort_month)) * 12 + (EXTRACT(MONTH FROM activity_month) - EXTRACT(MONTH FROM cohort_month))) AS month_number

Distinct User Aggregation

Users often generate multiple events in the same period, so retention should usually count distinct users, not raw events. Without deduplication, retention rates can be inflated and inconsistent across cohorts.

COUNT(DISTINCT user_id) AS retained_users

Retention Rate Calculation

After counting retained users by cohort and period, divide by the cohort size to get the retention rate. This normalization is what makes cohorts comparable even when they have different numbers of users.

retained_users::numeric / NULLIF(cohort_size, 0) AS retention_rate

Problem

Context

Core question

Explain how you would structure a SQL analysis to measure retention by cohort. Your answer should cover:

How to define the cohort date for each user
How to identify retained users in later periods
How to calculate the period offset from the cohort start
How to aggregate results into a cohort retention table or matrix
How to handle edge cases such as multiple events in the same period, missing activity, and partial current periods

Scope guidance

Key Concepts

Cohort Assignment

SELECT user_id, DATE_TRUNC('month', MIN(event_date)) AS cohort_month
FROM user_events
GROUP BY user_id;

Retention Event Definition

CASE WHEN event_type IN ('session', 'purchase') THEN 1 ELSE 0 END

Period Offset Calculation

((EXTRACT(YEAR FROM activity_month) - EXTRACT(YEAR FROM cohort_month)) * 12 + (EXTRACT(MONTH FROM activity_month) - EXTRACT(MONTH FROM cohort_month))) AS month_number

Distinct User Aggregation

COUNT(DISTINCT user_id) AS retained_users

Retention Rate Calculation

retained_users::numeric / NULLIF(cohort_size, 0) AS retention_rate

Your answer

Try one AI text evaluation on us

Get structured feedback, scored against a 4-axis rubric. Premium unlocks unlimited.

0 wordstarget ~200

Up next

Compare User Retention by CohortMedium ABuild Monthly Retention Cohort TableHard

Measure Retention After Product LaunchMedium

Next question

Core question

Explain how you would structure a SQL analysis to measure retention by cohort. Your answer should cover:

How to define the cohort date for each user

How to identify retained users in later periods

How to calculate the period offset from the cohort start

How to aggregate results into a cohort retention table or matrix

How to handle edge cases such as multiple events in the same period, missing activity, and partial current periods

Core question

Explain how you would structure a SQL analysis to measure retention by cohort. Your answer should cover:

How to define the cohort date for each user

How to identify retained users in later periods

How to calculate the period offset from the cohort start

How to aggregate results into a cohort retention table or matrix

How to handle edge cases such as multiple events in the same period, missing activity, and partial current periods

Cohort Assignment

SELECT user_id, DATE_TRUNC('month', MIN(event_date)) AS cohort_month
FROM user_events
GROUP BY user_id;

Retention Event Definition

CASE WHEN event_type IN ('session', 'purchase') THEN 1 ELSE 0 END

Period Offset Calculation

((EXTRACT(YEAR FROM activity_month) - EXTRACT(YEAR FROM cohort_month)) * 12 + (EXTRACT(MONTH FROM activity_month) - EXTRACT(MONTH FROM cohort_month))) AS month_number

Distinct User Aggregation

COUNT(DISTINCT user_id) AS retained_users

Retention Rate Calculation

retained_users::numeric / NULLIF(cohort_size, 0) AS retention_rate