Context
Chime is testing a redesigned Credit Builder enrollment prompt in the mobile app. The growth team sees a positive overall lift in enrollment, but early reads suggest the treatment may hurt a strategically important segment: new members in their first 30 days.
Hypothesis Seed
The new prompt simplifies the value proposition and adds stronger social proof. The team believes it will increase completed Credit Builder enrollments without harming downstream activation quality. However, there is concern that the new design may confuse newly onboarded members, even if it helps the broader population.
Constraints
- Eligible traffic: 220,000 app users/day who view the Credit Builder prompt
- Important segment: new members (account age < 30 days), about 25% of eligible traffic
- Baseline overall enrollment conversion: 12.0%
- Baseline new-member enrollment conversion: 9.0%
- Maximum experiment duration: 21 days
- Allocation target: 50/50 control vs treatment
- A false positive is costly because shipping a harmful experience to new members can reduce trust and future activation; a false negative is acceptable if it avoids member harm
Task
- Define the experiment hypothesis, including how you will treat the overall metric versus the important new-member segment.
- Specify the primary metric, guardrails, and at least one secondary metric, with a clear MDE and unit of analysis.
- Calculate the required sample size and whether the test can detect:
- a +3% relative lift overall, and
- no worse than a -2% relative decline in the new-member segment.
- Choose the unit of randomization and write a pre-registered analysis plan covering segment analysis, multiple comparisons, peeking, and SRM checks.
- State a clear ship / don’t ship / iterate rule for the case where the overall result is positive but the new-member segment is negative.
Your answer should explicitly address how you would avoid overreacting to noisy subgroup reads while still protecting a strategically important Chime member segment.