Plan Sample Size for Checkout Test

Context

ShopNow, a mid-sized e-commerce app, wants to test a simplified mobile checkout flow that removes one confirmation step. The product manager believes this will increase completed purchases, but engineering wants a clear answer within a fixed launch window.

Hypothesis Seed

The proposed change reduces friction in checkout, so the team expects a modest lift in purchase conversion among users who start checkout. Because the change touches payment UX, the team is also concerned about accidental purchases, payment failures, and support contacts.

Constraints

Eligible traffic: 120,000 mobile users per day who start checkout
Randomization can only be done at the user_id level
Maximum experiment duration: 14 days, including ramp
Planned allocation after ramp: 50/50
Baseline checkout completion rate: 24%
Business wants to detect at least a 5% relative lift in checkout completion
False positives are costly because a bad checkout experience can harm trust and payment success; false negatives are acceptable if the effect is too small to matter operationally
You may assume a two-sided test with = 0.05 and power = 80%

Deliverables

State the null and alternative hypotheses, define the primary metric, and propose 2-4 guardrail metrics.
Calculate the required sample size per arm using the stated baseline and MDE, and determine whether the test can be completed within 14 days.
Choose the experiment design: unit of randomization, allocation/ramp, duration, and any stratification or blocking.
Pre-register an analysis plan covering the statistical test, peeking policy, multiple comparisons treatment, and how you would check for sample ratio mismatch.
Explain the ship / don't-ship rule, including what happens if the primary metric is significant but a guardrail worsens or the observed lift is below the planned MDE.

Constraints

Eligible traffic: 120,000 mobile users per day who start checkout

Randomization can only be done at the user_id level

Maximum experiment duration: 14 days, including ramp

Planned allocation after ramp: 50/50

Baseline checkout completion rate: 24%

Business wants to detect at least a 5% relative lift in checkout completion

False positives are costly because a bad checkout experience can harm trust and payment success; false negatives are acceptable if the effect is too small to matter operationally

You may assume a two-sided test with = 0.05 and power = 80%

Deliverables

State the null and alternative hypotheses, define the primary metric, and propose 2-4 guardrail metrics.

Calculate the required sample size per arm using the stated baseline and MDE, and determine whether the test can be completed within 14 days.

Choose the experiment design: unit of randomization, allocation/ramp, duration, and any stratification or blocking.

Pre-register an analysis plan covering the statistical test, peeking policy, multiple comparisons treatment, and how you would check for sample ratio mismatch.

Explain the ship / don't-ship rule, including what happens if the primary metric is significant but a guardrail worsens or the observed lift is below the planned MDE.

Constraints

Eligible traffic: 120,000 mobile users per day who start checkout

Randomization can only be done at the user_id level

Maximum experiment duration: 14 days, including ramp

Planned allocation after ramp: 50/50

Baseline checkout completion rate: 24%

Business wants to detect at least a 5% relative lift in checkout completion

False positives are costly because a bad checkout experience can harm trust and payment success; false negatives are acceptable if the effect is too small to matter operationally

You may assume a two-sided test with = 0.05 and power = 80%

Deliverables

State the null and alternative hypotheses, define the primary metric, and propose 2-4 guardrail metrics.

Calculate the required sample size per arm using the stated baseline and MDE, and determine whether the test can be completed within 14 days.

Choose the experiment design: unit of randomization, allocation/ramp, duration, and any stratification or blocking.

Pre-register an analysis plan covering the statistical test, peeking policy, multiple comparisons treatment, and how you would check for sample ratio mismatch.

Explain the ship / don't-ship rule, including what happens if the primary metric is significant but a guardrail worsens or the observed lift is below the planned MDE.

Constraints

Eligible traffic: 120,000 mobile users per day who start checkout

Randomization can only be done at the user_id level

Maximum experiment duration: 14 days, including ramp

Planned allocation after ramp: 50/50

Baseline checkout completion rate: 24%

Business wants to detect at least a 5% relative lift in checkout completion

False positives are costly because a bad checkout experience can harm trust and payment success; false negatives are acceptable if the effect is too small to matter operationally

You may assume a two-sided test with = 0.05 and power = 80%

Deliverables

State the null and alternative hypotheses, define the primary metric, and propose 2-4 guardrail metrics.

Calculate the required sample size per arm using the stated baseline and MDE, and determine whether the test can be completed within 14 days.

Choose the experiment design: unit of randomization, allocation/ramp, duration, and any stratification or blocking.

Pre-register an analysis plan covering the statistical test, peeking policy, multiple comparisons treatment, and how you would check for sample ratio mismatch.

Explain the ship / don't-ship rule, including what happens if the primary metric is significant but a guardrail worsens or the observed lift is below the planned MDE.

Interview Guides

Context

Hypothesis Seed

Constraints

Deliverables

Plan Sample Size for Checkout Test

Context

Hypothesis Seed

Constraints

Deliverables

Your Answer

Plan Sample Size for Checkout Test

Context

Hypothesis Seed

Constraints

Deliverables

Plan Sample Size for Checkout Test

Context

Hypothesis Seed

Constraints

Deliverables

Your Answer