Context
At FitPulse, a subscription fitness app, the growth team wants to test a redesigned onboarding flow that adds a personalized goal-setting step before the paywall. Leadership wants confidence that any measured lift is trustworthy enough to justify a full rollout.
Hypothesis Seed
The team believes the extra personalization will increase trial-start conversion because users will better understand the value of the app before seeing pricing. However, the added step could also increase drop-off, delay activation, or create spillovers if invited household members discuss the new flow.
Constraints
- Eligible traffic: 120,000 new onboarding users per day
- 70% iOS, 30% Android
- Maximum experiment duration: 14 days
- Randomization must be decided before launch; engineering can support user-level or household-level assignment, but not both
- False positives are costly because onboarding changes require legal review and app-store resubmission
- False negatives are also meaningful because Q3 growth targets depend on improving paid conversion
- Baseline trial-start conversion from onboarding is 24%
- The smallest business-relevant lift is 4% relative
Deliverables
- Define a clear null and alternative hypothesis, including whether you would use a one-sided or two-sided test.
- Specify the primary metric, 2-4 guardrail metrics, and any secondary metrics. Include the unit of analysis and an explicit MDE.
- Calculate the required sample size per arm and estimate whether the test can finish within 14 days given available traffic.
- Choose the unit of randomization, allocation strategy, duration, and any stratification. Explain why this design makes the experiment trustworthy.
- Pre-register an analysis plan: statistical test, peeking policy, multiple-comparison policy, SRM checks, and how you would handle novelty effects, network interference, or SUTVA concerns.
Your answer should end with a concrete ship / don’t-ship / iterate rule that respects guardrails, not just the primary metric.