Context
ShopNow is testing a redesigned checkout call-to-action button on mobile web. The product manager asks: How long should we run the experiment, and what should we consider before deciding whether to stop and ship?
Hypothesis Seed
The new CTA uses clearer copy and stronger contrast, and the team believes it will increase completed purchase rate by reducing hesitation on the final checkout step. The expected gain is modest, so the team wants a disciplined experiment rather than relying on early directional results.
Constraints
- Eligible traffic: 120,000 mobile checkout sessions per day
- Current completed purchase rate from checkout start: 12.0%
- Engineering wants a decision within 21 days
- False positives are costly because a bad launch directly reduces revenue
- False negatives are also meaningful because this is a high-traffic funnel step
- The team can randomize 50/50 after a 1-day instrumentation ramp
Task
- Define the hypothesis, primary metric, secondary metrics, and guardrails. Be explicit about the minimum detectable effect (MDE) you would power for.
- Calculate the required sample size and translate it into a recommended experiment duration using the traffic provided. Show the math and state any assumptions.
- Choose the unit of randomization and explain whether your unit of analysis matches it. If not, explain how you will analyze correctly.
- Pre-register the analysis plan: statistical test, alpha, power, peeking policy, multiple-comparison policy, and how you will check for sample ratio mismatch.
- State a clear ship / don’t ship / iterate decision rule that uses both the primary metric and guardrails, and explain what additional considerations matter before deciding the experiment has run long enough (for example weekly seasonality, novelty effects, or operational anomalies).