Business Context
StreamBox, a subscription video platform, saw lower January retention than December and wants to know whether retention is actually declining or if the drop is just normal post-holiday seasonality.
Problem Statement
You are given monthly cohort retention rates for the same acquisition channel over two years. The product team wants a seasonality-adjusted estimate of year-over-year retention change and a statistical test of whether the underlying retention level has improved.
Given Data
The table below shows month-level Day-30 retention for Jan-Jun in two consecutive years.
| Month | Year 1 Retention | Year 2 Retention | Year 1 Cohort Size | Year 2 Cohort Size |
|---|
| Jan | 41.2% | 43.1% | 12,400 | 12,900 |
| Feb | 42.0% | 44.0% | 11,800 | 12,100 |
| Mar | 44.1% | 45.0% | 13,100 | 13,400 |
| Apr | 45.0% | 46.2% | 12,900 | 13,000 |
| May | 46.3% | 47.1% | 13,500 | 13,700 |
| Jun | 47.1% | 48.0% | 13,200 | 13,500 |
Assume the monthly pattern is seasonal and repeats similarly across years.
Requirements
- Compute the naive average retention for each year and the naive year-over-year difference.
- Explain why comparing December to January, or even raw averages across different month mixes, can be misleading when seasonality exists.
- Estimate the seasonality-adjusted year-over-year lift by comparing matched months.
- Construct a 95% confidence interval for the mean matched-month lift.
- Test whether the average matched-month lift is greater than 0 at α=0.05 using a paired t-test.
- State whether the evidence supports a real retention improvement after accounting for seasonality.
Assumptions
- Each month pair represents the same seasonal position across years.
- Monthly matched differences are approximately independent.
- A paired t-test is appropriate for the six month-level differences.
- Cohort definitions and measurement windows are consistent across both years.