Context
ShopNow, a mid-size e-commerce app, wants to test a simplified mobile checkout flow that removes one form step. Leadership needs a decision this month because the current quarter's roadmap depends on whether the redesign is worth rolling out.
Hypothesis Seed
The product team believes the shorter checkout will increase purchase conversion by reducing friction, but they are unsure what minimum detectable effect (MDE) is realistic and how that choice should shape sample size, duration, and decision-making. They expect a modest lift, likely between 1% and 3% relative, and want to avoid shipping a change that improves conversion slightly but hurts average order value or payment success.
Constraints
- Eligible traffic: 120,000 mobile checkout sessions per day
- Baseline purchase conversion from checkout start to completed order: 24%
- Maximum experiment duration: 14 days, including at least one full weekend cycle
- Allocation can be 50/50 after a short instrumentation ramp
- False positives are costly because checkout bugs directly affect revenue
- False negatives are also meaningful because engineering has already invested 6 weeks in the redesign
- The team wants 80% power at a 5% significance level
Tasks
- Define the null and alternative hypotheses, the primary metric, 2-4 guardrail metrics, and a realistic MDE. Explain how the MDE changes the required sample size and whether the test is feasible within 14 days.
- Calculate the required sample size per arm using the baseline conversion rate and your chosen MDE. Show the math and translate it into expected runtime given available traffic.
- Choose the unit of randomization and explain whether the unit of analysis should match it. Specify allocation, duration, and any stratification or blocking you would use.
- Pre-register an analysis plan: statistical test, peeking policy, treatment of secondary metrics, SRM checks, and what to do if guardrails worsen while the primary metric improves.
- State a clear ship / don't-ship / iterate rule that respects both statistical significance and practical significance.