Conflicting Metrics in Checkout Test

Context

ShopNow is testing a redesigned mobile checkout that reduces the number of form fields and adds one-tap address autofill. Early directional reads suggest checkout completion improved, but average order value may have declined.

Hypothesis Seed

The product team believes the simpler checkout will increase completed purchases by reducing friction, especially for first-time buyers. However, finance is concerned the faster flow may reduce add-on purchases and lower revenue per visitor.

Constraints

Eligible traffic: 240,000 mobile checkout starters per day
Maximum experiment duration: 14 days; a launch decision is needed before a seasonal campaign
Allocation can be 50/50 after a brief instrumentation ramp
Baseline checkout completion rate: 38%
Baseline revenue per checkout starter: $24.00, with standard deviation $60
Business requirement: detect at least a 2% relative lift in checkout completion
Guardrail: do not ship if revenue per checkout starter drops by more than 1% relative, or payment error rate rises by more than 0.2 percentage points
False positives are costly because a bad launch during the campaign would directly hurt revenue; false negatives are acceptable if the team can iterate next sprint

Task

Define the experiment hypothesis, primary metric, and guardrails, and explain how you would make a recommendation when the primary metric is positive but a key guardrail is negative.
Calculate the required sample size and expected runtime for the primary metric using the stated MDE, alpha, and power assumptions.
Choose the unit of randomization and analysis, and explain any mismatch risks.
Pre-register an analysis plan covering the statistical test, multiple comparisons policy, peeking policy, and the exact ship / don't-ship rule.
Name the main pitfalls you would watch for in this experiment, including at least one issue related to conflicting metrics or biased reads across segments.

Constraints

Eligible traffic: 240,000 mobile checkout starters per day

Maximum experiment duration: 14 days; a launch decision is needed before a seasonal campaign

Allocation can be 50/50 after a brief instrumentation ramp

Baseline checkout completion rate: 38%

Baseline revenue per checkout starter: $24.00, with standard deviation $60

Business requirement: detect at least a 2% relative lift in checkout completion

Guardrail: do not ship if revenue per checkout starter drops by more than 1% relative, or payment error rate rises by more than 0.2 percentage points

False positives are costly because a bad launch during the campaign would directly hurt revenue; false negatives are acceptable if the team can iterate next sprint

Task

Define the experiment hypothesis, primary metric, and guardrails, and explain how you would make a recommendation when the primary metric is positive but a key guardrail is negative.

Calculate the required sample size and expected runtime for the primary metric using the stated MDE, alpha, and power assumptions.

Choose the unit of randomization and analysis, and explain any mismatch risks.

Pre-register an analysis plan covering the statistical test, multiple comparisons policy, peeking policy, and the exact ship / don't-ship rule.

Name the main pitfalls you would watch for in this experiment, including at least one issue related to conflicting metrics or biased reads across segments.

Constraints

Eligible traffic: 240,000 mobile checkout starters per day

Maximum experiment duration: 14 days; a launch decision is needed before a seasonal campaign

Allocation can be 50/50 after a brief instrumentation ramp

Baseline checkout completion rate: 38%

Baseline revenue per checkout starter: $24.00, with standard deviation $60

Business requirement: detect at least a 2% relative lift in checkout completion

Guardrail: do not ship if revenue per checkout starter drops by more than 1% relative, or payment error rate rises by more than 0.2 percentage points

False positives are costly because a bad launch during the campaign would directly hurt revenue; false negatives are acceptable if the team can iterate next sprint

Task

Define the experiment hypothesis, primary metric, and guardrails, and explain how you would make a recommendation when the primary metric is positive but a key guardrail is negative.

Calculate the required sample size and expected runtime for the primary metric using the stated MDE, alpha, and power assumptions.

Choose the unit of randomization and analysis, and explain any mismatch risks.

Pre-register an analysis plan covering the statistical test, multiple comparisons policy, peeking policy, and the exact ship / don't-ship rule.

Name the main pitfalls you would watch for in this experiment, including at least one issue related to conflicting metrics or biased reads across segments.

Constraints

Eligible traffic: 240,000 mobile checkout starters per day

Maximum experiment duration: 14 days; a launch decision is needed before a seasonal campaign

Allocation can be 50/50 after a brief instrumentation ramp

Baseline checkout completion rate: 38%

Baseline revenue per checkout starter: $24.00, with standard deviation $60

Business requirement: detect at least a 2% relative lift in checkout completion

Guardrail: do not ship if revenue per checkout starter drops by more than 1% relative, or payment error rate rises by more than 0.2 percentage points

False positives are costly because a bad launch during the campaign would directly hurt revenue; false negatives are acceptable if the team can iterate next sprint

Task

Define the experiment hypothesis, primary metric, and guardrails, and explain how you would make a recommendation when the primary metric is positive but a key guardrail is negative.

Calculate the required sample size and expected runtime for the primary metric using the stated MDE, alpha, and power assumptions.

Choose the unit of randomization and analysis, and explain any mismatch risks.

Pre-register an analysis plan covering the statistical test, multiple comparisons policy, peeking policy, and the exact ship / don't-ship rule.

Name the main pitfalls you would watch for in this experiment, including at least one issue related to conflicting metrics or biased reads across segments.

Interview Guides

Context

Hypothesis Seed

Constraints

Task

Conflicting Metrics in Checkout Test

Context

Hypothesis Seed

Constraints

Task

Your Answer

Conflicting Metrics in Checkout Test

Context

Hypothesis Seed

Constraints

Task

Conflicting Metrics in Checkout Test

Context

Hypothesis Seed

Constraints

Task

Your Answer