Context
ShopNow, a mid-size e-commerce app, wants to test a simplified mobile checkout that removes one form step. The product team believes this will increase completed purchases, but engineering only wants to launch if the expected gain is large enough to justify rollout risk and implementation cost.
Hypothesis Seed
The proposed change shortens checkout from 4 screens to 3 for eligible mobile users. PMs expect a modest lift in purchase conversion because fewer users abandon during address entry, but they are unsure what minimum detectable effect (MDE) should be chosen before launch.
Constraints
- Eligible traffic: 120,000 mobile checkout sessions per day
- Randomization split: 50/50 after a 5% canary for instrumentation validation
- Baseline purchase conversion from checkout start to order completion: 24.0%
- Maximum experiment duration: 14 days
- False positives are costly because a bad checkout experience can hurt revenue immediately
- False negatives are also meaningful because the redesign required 6 engineer-weeks and has opportunity cost
- The team wants 80% power and a two-sided 5% significance level
- The experiment must cover at least one full weekly cycle
Task
- Define the null and alternative hypotheses, the primary metric, and 2-4 guardrail metrics.
- Determine a reasonable MDE before launch, then calculate the required sample size per arm and whether the test can finish within 14 days.
- Choose the unit of randomization and explain any mismatch with the unit of analysis.
- Pre-register the analysis plan: statistical test, peeking policy, multiple-comparison handling, and SRM checks.
- State a clear ship / don't-ship / iterate rule that respects both the primary metric and guardrails.
Be explicit about how you choose the MDE: tie it to business value, traffic constraints, and the cost of shipping a false positive versus missing a real but small win.