Context
ShopNow, a mid-size e-commerce app, wants to test a simplified checkout page that removes one form step. The product team believes the change will increase purchase conversion, but leadership wants a rigorous experiment design before exposing most traffic.
Hypothesis Seed
The new checkout reduces friction and should improve the proportion of checkout starters who complete a purchase. The team expects a modest lift, not a dramatic one, and wants to know whether the available traffic is enough to detect a meaningful improvement.
Constraints
- Eligible traffic: 120,000 checkout starters per day
- Maximum experiment runtime: 14 days, including weekends
- Traffic allocation target: 50/50 after a brief instrumentation ramp
- Baseline purchase conversion from checkout start: 12.0%
- The smallest business-meaningful lift is +5% relative in conversion
- False positives are costly because a bad checkout experience can hurt revenue and trust; false negatives are acceptable if the missed lift is very small
- The team must make a ship / don’t-ship decision by the end of week 2
Deliverables
- State the null and alternative hypotheses, and specify whether the primary test should be one-sided or two-sided.
- Define the primary metric, at least two guardrail metrics, and the minimum detectable effect (MDE). Be explicit about the unit of randomization and unit of analysis.
- Calculate the required sample size per arm for the primary conversion metric using the stated baseline, MDE, significance level, and power. Then translate that sample size into expected runtime given the traffic constraint.
- Pre-register an analysis plan: statistical test, treatment of guardrails and secondary metrics, peeking policy, and what to do if there is a sample ratio mismatch.
- Give a final decision rule for ship / don’t-ship / iterate that respects both the primary metric and guardrails, and mention key pitfalls such as novelty effects or interference across users.