Context
ShopNow is testing a redesigned mobile checkout that reduces the number of form fields and adds one-tap address autofill. Early directional reads suggest checkout completion improved, but average order value may have declined.
Hypothesis Seed
The product team believes the simpler checkout will increase completed purchases by reducing friction, especially for first-time buyers. However, finance is concerned the faster flow may reduce add-on purchases and lower revenue per visitor.
Constraints
- Eligible traffic: 240,000 mobile checkout starters per day
- Maximum experiment duration: 14 days; a launch decision is needed before a seasonal campaign
- Allocation can be 50/50 after a brief instrumentation ramp
- Baseline checkout completion rate: 38%
- Baseline revenue per checkout starter: $24.00, with standard deviation $60
- Business requirement: detect at least a 2% relative lift in checkout completion
- Guardrail: do not ship if revenue per checkout starter drops by more than 1% relative, or payment error rate rises by more than 0.2 percentage points
- False positives are costly because a bad launch during the campaign would directly hurt revenue; false negatives are acceptable if the team can iterate next sprint
Task
- Define the experiment hypothesis, primary metric, and guardrails, and explain how you would make a recommendation when the primary metric is positive but a key guardrail is negative.
- Calculate the required sample size and expected runtime for the primary metric using the stated MDE, alpha, and power assumptions.
- Choose the unit of randomization and analysis, and explain any mismatch risks.
- Pre-register an analysis plan covering the statistical test, multiple comparisons policy, peeking policy, and the exact ship / don't-ship rule.
- Name the main pitfalls you would watch for in this experiment, including at least one issue related to conflicting metrics or biased reads across segments.