Context
ShopNow is testing a redesigned mobile checkout flow intended to reduce friction and increase completed purchases. The product lead wants a decision within two weeks because the current quarter's launch calendar closes soon.
Hypothesis Seed
The new checkout removes one confirmation step and pre-fills saved shipping details. The team believes this will increase purchase conversion, but it could also create accidental purchases, payment failures, or short-lived novelty effects.
Constraints
- Eligible traffic: 120,000 mobile checkout-start sessions per day
- 85% of traffic is logged-in users; 15% are guest users
- Maximum experiment length: 14 days
- Randomization can be done at
user_id for logged-in users and device_id for guests
- A false positive is costly because a bad checkout experience directly hurts revenue and trust
- A false negative is also meaningful because checkout is a major growth lever, but less costly than shipping a harmful change
Task
Design the experiment and explain the main pitfalls you would watch for when analyzing it.
- State the null and alternative hypotheses, define the primary metric, at least three guardrails, and an explicit MDE.
- Calculate the required sample size and determine whether the test can be completed within 14 days given the available traffic.
- Choose the unit of randomization, allocation, duration, and any stratification or blocking you would use.
- Pre-register an analysis plan: statistical test, peeking policy, multiple-comparisons policy, and how you will handle any mismatch between unit of randomization and unit of analysis.
- Identify the most important analysis pitfalls for this experiment—such as peeking, novelty effects, network interference, SUTVA violations, Simpson's paradox, and sample ratio mismatch—and explain how each would affect interpretation and what mitigation you would use.
Be explicit about the final ship / don't-ship rule. Your answer should assume leadership will ask for a recommendation immediately after the readout, so the decision framework must be operational rather than theoretical.