Diagnose Pitfalls in Checkout Test

Context

ShopNow is testing a redesigned mobile checkout flow intended to reduce friction and increase completed purchases. The product lead wants a decision within two weeks because the current quarter's launch calendar closes soon.

Hypothesis Seed

The new checkout removes one confirmation step and pre-fills saved shipping details. The team believes this will increase purchase conversion, but it could also create accidental purchases, payment failures, or short-lived novelty effects.

Constraints

Eligible traffic: 120,000 mobile checkout-start sessions per day
85% of traffic is logged-in users; 15% are guest users
Maximum experiment length: 14 days
Randomization can be done at user_id for logged-in users and device_id for guests
A false positive is costly because a bad checkout experience directly hurts revenue and trust
A false negative is also meaningful because checkout is a major growth lever, but less costly than shipping a harmful change

Task

Design the experiment and explain the main pitfalls you would watch for when analyzing it.

State the null and alternative hypotheses, define the primary metric, at least three guardrails, and an explicit MDE.
Calculate the required sample size and determine whether the test can be completed within 14 days given the available traffic.
Choose the unit of randomization, allocation, duration, and any stratification or blocking you would use.
Pre-register an analysis plan: statistical test, peeking policy, multiple-comparisons policy, and how you will handle any mismatch between unit of randomization and unit of analysis.
Identify the most important analysis pitfalls for this experiment—such as peeking, novelty effects, network interference, SUTVA violations, Simpson's paradox, and sample ratio mismatch—and explain how each would affect interpretation and what mitigation you would use.

Be explicit about the final ship / don't-ship rule. Your answer should assume leadership will ask for a recommendation immediately after the readout, so the decision framework must be operational rather than theoretical.

Constraints

Eligible traffic: 120,000 mobile checkout-start sessions per day

85% of traffic is logged-in users; 15% are guest users

Maximum experiment length: 14 days

Randomization can be done at user_id for logged-in users and device_id for guests

A false positive is costly because a bad checkout experience directly hurts revenue and trust

A false negative is also meaningful because checkout is a major growth lever, but less costly than shipping a harmful change

Task

Design the experiment and explain the main pitfalls you would watch for when analyzing it.

State the null and alternative hypotheses, define the primary metric, at least three guardrails, and an explicit MDE.

Calculate the required sample size and determine whether the test can be completed within 14 days given the available traffic.

Choose the unit of randomization, allocation, duration, and any stratification or blocking you would use.

Pre-register an analysis plan: statistical test, peeking policy, multiple-comparisons policy, and how you will handle any mismatch between unit of randomization and unit of analysis.

Identify the most important analysis pitfalls for this experiment—such as peeking, novelty effects, network interference, SUTVA violations, Simpson's paradox, and sample ratio mismatch—and explain how each would affect interpretation and what mitigation you would use.

Constraints

Eligible traffic: 120,000 mobile checkout-start sessions per day

85% of traffic is logged-in users; 15% are guest users

Maximum experiment length: 14 days

Randomization can be done at user_id for logged-in users and device_id for guests

A false positive is costly because a bad checkout experience directly hurts revenue and trust

A false negative is also meaningful because checkout is a major growth lever, but less costly than shipping a harmful change

Task

Design the experiment and explain the main pitfalls you would watch for when analyzing it.

State the null and alternative hypotheses, define the primary metric, at least three guardrails, and an explicit MDE.

Calculate the required sample size and determine whether the test can be completed within 14 days given the available traffic.

Choose the unit of randomization, allocation, duration, and any stratification or blocking you would use.

Pre-register an analysis plan: statistical test, peeking policy, multiple-comparisons policy, and how you will handle any mismatch between unit of randomization and unit of analysis.

Constraints

Eligible traffic: 120,000 mobile checkout-start sessions per day

85% of traffic is logged-in users; 15% are guest users

Maximum experiment length: 14 days

Randomization can be done at user_id for logged-in users and device_id for guests

A false positive is costly because a bad checkout experience directly hurts revenue and trust

A false negative is also meaningful because checkout is a major growth lever, but less costly than shipping a harmful change

Task

Design the experiment and explain the main pitfalls you would watch for when analyzing it.

State the null and alternative hypotheses, define the primary metric, at least three guardrails, and an explicit MDE.

Calculate the required sample size and determine whether the test can be completed within 14 days given the available traffic.

Choose the unit of randomization, allocation, duration, and any stratification or blocking you would use.

Pre-register an analysis plan: statistical test, peeking policy, multiple-comparisons policy, and how you will handle any mismatch between unit of randomization and unit of analysis.

Problem

Context

Hypothesis Seed

Constraints

Task

Problem

Context

Hypothesis Seed

Constraints

Task

Diagnose Pitfalls in Checkout Test

Problem

Context

Hypothesis Seed

Constraints

Task

Problem

Context

Hypothesis Seed

Constraints

Task