Context
ShopNow, a mobile commerce app, is testing a simplified one-page checkout intended to reduce friction and improve completed purchase rate. Three days into the experiment, the treatment looks worse than control, and leadership asks whether the team should stop the test early.
Hypothesis Seed
The new checkout removes one confirmation step and pre-fills shipping details for logged-in users. The product team believes this will increase purchase conversion, but there is concern that it may also create confusion, increase payment failures, or cause a short-term novelty effect.
Constraints
- Eligible traffic: 120,000 checkout starts per day
- 80% of traffic is mobile app, 20% mobile web; desktop is excluded
- Maximum experiment window: 14 days, because a seasonal promotion starts after that
- Baseline completed purchase rate from checkout start is 40%
- The smallest worthwhile improvement is 2% relative lift in completed purchase rate
- False positives are costly because a bad checkout experience directly harms revenue; false negatives are acceptable but still undesirable
- The team wants a clear rule for when to stop early for harm versus when to wait for the pre-registered readout
Deliverables
- State the null and alternative hypotheses, including whether you would use a one-sided or two-sided test.
- Define the primary metric, 2-4 guardrails, and an explicit MDE. Explain what metric should govern an early stop decision if the test appears to be failing.
- Calculate the required sample size and expected duration using the provided traffic assumptions.
- Propose the experiment design: unit of randomization, allocation, duration, and any stratification.
- Pre-register the analysis plan, including the statistical test, peeking/early stopping policy, multiple-comparison handling, and how you would respond to sample ratio mismatch or novelty effects.