Context
ShopNow, a mid-sized e-commerce app, wants to test a redesigned mobile checkout that removes one review step and highlights express payment methods. Leadership cares about conversion, but they are explicitly worried that optimizing only for the primary metric could hide harm elsewhere.
Hypothesis Seed
The team believes the simplified checkout will increase completed purchase rate by reducing friction. However, it could also increase accidental purchases, payment failures, refund requests, or customer-support contacts. You are asked to design the experiment with guardrails that would prevent shipping a misleading “win.”
Constraints
- Eligible traffic: 240,000 mobile checkout sessions per day
- 85% of traffic is on iOS/Android app, 15% on mobile web
- Baseline checkout completion rate: 38%
- Maximum experiment duration: 14 days
- The business wants 80% power at a 5% two-sided significance level
- Small false positives are costly because checkout bugs directly affect revenue and trust
- False negatives are also costly because the redesign is expected to reduce abandonment before peak holiday traffic
Tasks
- Define the experiment hypothesis, primary metric, and 2-4 guardrail metrics. Be explicit about why each guardrail matters and what threshold would block a launch.
- Calculate the required sample size for the primary metric using a clearly stated MDE, and translate that into expected runtime given available traffic.
- Choose the unit of randomization, allocation, duration, and any stratification or ramp plan. Explain why your design avoids contamination.
- Pre-register an analysis plan: statistical test, peeking policy, multiple-comparison treatment, and how you will handle any mismatch between unit of randomization and unit of analysis.
- State a clear ship / don’t-ship / iterate rule that respects guardrails, and identify key pitfalls such as novelty effects, sample ratio mismatch, and interference across devices or users.