Context
ShopNow is testing a simplified mobile checkout flow intended to increase completed purchases. Midway through the experiment, the team notices that treatment is receiving materially less traffic than expected and asks whether the experiment is still trustworthy.
Hypothesis Seed
The new checkout removes one confirmation step and auto-fills saved shipping information. Product expects a modest lift in purchase conversion, but an allocation bug or logging issue could create sample ratio mismatch (SRM), making any treatment effect estimate invalid.
Constraints
- Eligible traffic: 240,000 mobile checkout starters per day
- Planned allocation: 50/50 control vs treatment
- Maximum runtime: 14 days
- Baseline purchase conversion from checkout start: 24%
- Smallest business-relevant lift: 2% relative
- False positives are costly because a broken checkout harms revenue immediately; false negatives are acceptable if they only delay launch by one sprint
- The team wants daily monitoring for SRM, but no repeated significance testing on the primary metric before the pre-registered readout
Deliverables
- State the null and alternative hypotheses for both the product effect and the SRM diagnostic, and explain how you would identify SRM in the experiment dataset.
- Define the primary metric, 2-4 guardrails, and at least one secondary metric. Include the unit of randomization and unit of analysis.
- Calculate the required sample size for the primary metric using explicit assumptions for baseline, MDE, alpha, and power, then translate that into expected runtime given available traffic.
- Pre-register an analysis plan covering the statistical test, peeking policy, multiple-comparison treatment, and what happens if SRM is detected at any point.
- Give a clear ship / don’t-ship / investigate decision rule that respects guardrails and explains why SRM can invalidate otherwise significant results.