Stopping a Failing Checkout Test

Context

ShopNow, a mobile commerce app, is testing a simplified one-page checkout intended to reduce friction and improve completed purchase rate. Three days into the experiment, the treatment looks worse than control, and leadership asks whether the team should stop the test early.

Hypothesis Seed

The new checkout removes one confirmation step and pre-fills shipping details for logged-in users. The product team believes this will increase purchase conversion, but there is concern that it may also create confusion, increase payment failures, or cause a short-term novelty effect.

Constraints

Eligible traffic: 120,000 checkout starts per day
80% of traffic is mobile app, 20% mobile web; desktop is excluded
Maximum experiment window: 14 days, because a seasonal promotion starts after that
Baseline completed purchase rate from checkout start is 40%
The smallest worthwhile improvement is 2% relative lift in completed purchase rate
False positives are costly because a bad checkout experience directly harms revenue; false negatives are acceptable but still undesirable
The team wants a clear rule for when to stop early for harm versus when to wait for the pre-registered readout

Deliverables

State the null and alternative hypotheses, including whether you would use a one-sided or two-sided test.
Define the primary metric, 2-4 guardrails, and an explicit MDE. Explain what metric should govern an early stop decision if the test appears to be failing.
Calculate the required sample size and expected duration using the provided traffic assumptions.
Propose the experiment design: unit of randomization, allocation, duration, and any stratification.
Pre-register the analysis plan, including the statistical test, peeking/early stopping policy, multiple-comparison handling, and how you would respond to sample ratio mismatch or novelty effects.

Context

Hypothesis Seed

Constraints

Eligible traffic: 120,000 checkout starts per day
80% of traffic is mobile app, 20% mobile web; desktop is excluded
Maximum experiment window: 14 days, because a seasonal promotion starts after that
Baseline completed purchase rate from checkout start is 40%
The smallest worthwhile improvement is 2% relative lift in completed purchase rate
False positives are costly because a bad checkout experience directly harms revenue; false negatives are acceptable but still undesirable
The team wants a clear rule for when to stop early for harm versus when to wait for the pre-registered readout

Deliverables

State the null and alternative hypotheses, including whether you would use a one-sided or two-sided test.
Define the primary metric, 2-4 guardrails, and an explicit MDE. Explain what metric should govern an early stop decision if the test appears to be failing.
Calculate the required sample size and expected duration using the provided traffic assumptions.
Propose the experiment design: unit of randomization, allocation, duration, and any stratification.
Pre-register the analysis plan, including the statistical test, peeking/early stopping policy, multiple-comparison handling, and how you would respond to sample ratio mismatch or novelty effects.

Context

Hypothesis Seed

Constraints

Eligible traffic: 120,000 checkout starts per day
80% of traffic is mobile app, 20% mobile web; desktop is excluded
Maximum experiment window: 14 days, because a seasonal promotion starts after that
Baseline completed purchase rate from checkout start is 40%
The smallest worthwhile improvement is 2% relative lift in completed purchase rate
False positives are costly because a bad checkout experience directly harms revenue; false negatives are acceptable but still undesirable
The team wants a clear rule for when to stop early for harm versus when to wait for the pre-registered readout

Deliverables

State the null and alternative hypotheses, including whether you would use a one-sided or two-sided test.
Define the primary metric, 2-4 guardrails, and an explicit MDE. Explain what metric should govern an early stop decision if the test appears to be failing.
Calculate the required sample size and expected duration using the provided traffic assumptions.
Propose the experiment design: unit of randomization, allocation, duration, and any stratification.
Pre-register the analysis plan, including the statistical test, peeking/early stopping policy, multiple-comparison handling, and how you would respond to sample ratio mismatch or novelty effects.

Context

Hypothesis Seed

Constraints

Eligible traffic: 120,000 checkout starts per day
80% of traffic is mobile app, 20% mobile web; desktop is excluded
Maximum experiment window: 14 days, because a seasonal promotion starts after that
Baseline completed purchase rate from checkout start is 40%
The smallest worthwhile improvement is 2% relative lift in completed purchase rate
False positives are costly because a bad checkout experience directly harms revenue; false negatives are acceptable but still undesirable
The team wants a clear rule for when to stop early for harm versus when to wait for the pre-registered readout

Deliverables

State the null and alternative hypotheses, including whether you would use a one-sided or two-sided test.
Define the primary metric, 2-4 guardrails, and an explicit MDE. Explain what metric should govern an early stop decision if the test appears to be failing.
Calculate the required sample size and expected duration using the provided traffic assumptions.
Propose the experiment design: unit of randomization, allocation, duration, and any stratification.
Pre-register the analysis plan, including the statistical test, peeking/early stopping policy, multiple-comparison handling, and how you would respond to sample ratio mismatch or novelty effects.

Interview Guides

Context

Hypothesis Seed

Constraints

Deliverables

Stopping a Failing Checkout Test

Context

Hypothesis Seed

Constraints

Deliverables

Your Answer

Stopping a Failing Checkout Test

Context

Hypothesis Seed

Constraints

Deliverables

Stopping a Failing Checkout Test

Context

Hypothesis Seed

Constraints

Deliverables

Your Answer