Context
StreamFlow, a subscription video app, redesigned its new-user onboarding to reduce friction: the new flow cuts signup from 5 screens to 3 and delays preference collection until after first play. The PM asks whether to launch if the experiment looks directionally positive but is not statistically significant.
Hypothesis Seed
The team believes the shorter onboarding will increase activation by reducing abandonment during account setup. However, it may also lower downstream retention if users provide fewer preferences, so the launch decision cannot rely on a single top-line metric.
Constraints
- Eligible traffic: 120,000 new-user signups per day
- Only 35% of signups are truly new users who have never installed before; exclude re-installs where possible
- Maximum experiment duration: 14 days, because marketing campaign creative changes after that
- False positive cost: shipping a weaker onboarding to all new users could hurt paid conversion and week-1 retention
- False negative cost: delaying a good onboarding means losing activation during a major acquisition push
- Engineering can support only a simple 50/50 split and one primary decision metric
Deliverables
- State the null and alternative hypotheses, and decide whether the test should be one-sided or two-sided.
- Define the primary metric, 2-4 guardrails, and at least one secondary metric. Include a clear MDE for the primary metric.
- Calculate the required sample size and determine whether the 14-day traffic budget is enough.
- Choose the unit of randomization and explain how you would analyze the test, including peeking policy, multiple-comparisons treatment, and what to do if results are directionally positive but not statistically significant.
- List key pitfalls for this experiment, including at least one instrumentation or interference risk, and give a ship / don’t-ship / iterate rule that respects guardrails.
Use the following planning assumptions for your calculations:
- Baseline activation rate (complete signup and start first stream within 24 hours): 32%
- Smallest business-meaningful lift: +1.6 percentage points absolute
- Baseline paid conversion within 7 days: 8.5%
- Baseline D7 retention: 24%
- Significance level: 0.05
- Desired power: 80%