P-Value Meaning in Checkout A/B

Business Context

You’re a data scientist at CartJet, a high-volume e-commerce marketplace (~8M weekly active users) that is redesigning its checkout page to reduce friction and increase completed purchases. The product team ran a 10-day A/B test and is excited because the treatment shows a higher conversion rate.

In the launch meeting, a PM says: “The p-value is 0.03, so there’s a 97% chance the new checkout is better.” Another stakeholder says: “A p-value of 0.03 means only 3% of the observed lift is due to randomness.” You need to correct the interpretation and make a recommendation that’s statistically sound and business-relevant.

Problem Statement

Using the experiment results below, compute the p-value and then explain what the p-value does and does not mean in this hypothesis test. Finally, connect the statistical result to a rollout decision, including at least one caveat.

Given Data

Item	Control (A)	Treatment (B)
Users exposed (n)	120,000	120,000
Purchases (x)	7,800	8,160
Observed conversion rate (x/n)	6.50%	6.80%
Significance level (α)	-	0.05

Notes: Traffic split was 50/50 by user_id hash. A user is counted once (first exposure only).

Requirements

State the null and alternative hypotheses for whether the redesign changes conversion.
Compute the two-proportion z-test statistic using a pooled standard error under $H_0$ .
Compute the (two-sided) p-value.
In 2–4 sentences, define the p-value in this context and explicitly address why the two stakeholder statements above are incorrect.
Make a rollout recommendation (ship / don’t ship / run longer / segment) and justify it with both statistical and practical considerations.

Assumptions and Constraints

Random assignment and independence across users.
Large-sample normal approximation is acceptable (check that $np$ and $n(1-p)$ are sufficiently large).
You are not adjusting for multiple comparisons (assume this was the primary metric and primary test).

Business Context

Problem Statement

Given Data

Item	Control (A)	Treatment (B)
Users exposed (n)	120,000	120,000
Purchases (x)	7,800	8,160
Observed conversion rate (x/n)	6.50%	6.80%
Significance level (α)	-	0.05

Notes: Traffic split was 50/50 by user_id hash. A user is counted once (first exposure only).

Requirements

State the null and alternative hypotheses for whether the redesign changes conversion.
Compute the two-proportion z-test statistic using a pooled standard error under $H_0$ .
Compute the (two-sided) p-value.
In 2–4 sentences, define the p-value in this context and explicitly address why the two stakeholder statements above are incorrect.
Make a rollout recommendation (ship / don’t ship / run longer / segment) and justify it with both statistical and practical considerations.

Assumptions and Constraints

Random assignment and independence across users.
Large-sample normal approximation is acceptable (check that $np$ and $n(1-p)$ are sufficiently large).
You are not adjusting for multiple comparisons (assume this was the primary metric and primary test).

Business Context

Problem Statement

Given Data

Item	Control (A)	Treatment (B)
Users exposed (n)	120,000	120,000
Purchases (x)	7,800	8,160
Observed conversion rate (x/n)	6.50%	6.80%
Significance level (α)	-	0.05

Notes: Traffic split was 50/50 by user_id hash. A user is counted once (first exposure only).

Requirements

State the null and alternative hypotheses for whether the redesign changes conversion.
Compute the two-proportion z-test statistic using a pooled standard error under $H_0$ .
Compute the (two-sided) p-value.
In 2–4 sentences, define the p-value in this context and explicitly address why the two stakeholder statements above are incorrect.
Make a rollout recommendation (ship / don’t ship / run longer / segment) and justify it with both statistical and practical considerations.

Assumptions and Constraints

Random assignment and independence across users.
Large-sample normal approximation is acceptable (check that $np$ and $n(1-p)$ are sufficiently large).
You are not adjusting for multiple comparisons (assume this was the primary metric and primary test).

Business Context

Problem Statement

Given Data

Item	Control (A)	Treatment (B)
Users exposed (n)	120,000	120,000
Purchases (x)	7,800	8,160
Observed conversion rate (x/n)	6.50%	6.80%
Significance level (α)	-	0.05

Notes: Traffic split was 50/50 by user_id hash. A user is counted once (first exposure only).

Requirements

State the null and alternative hypotheses for whether the redesign changes conversion.
Compute the two-proportion z-test statistic using a pooled standard error under $H_0$ .
Compute the (two-sided) p-value.
In 2–4 sentences, define the p-value in this context and explicitly address why the two stakeholder statements above are incorrect.
Make a rollout recommendation (ship / don’t ship / run longer / segment) and justify it with both statistical and practical considerations.

Assumptions and Constraints

Random assignment and independence across users.
Large-sample normal approximation is acceptable (check that $np$ and $n(1-p)$ are sufficiently large).
You are not adjusting for multiple comparisons (assume this was the primary metric and primary test).

Interview Guides

Business Context

Problem Statement

Given Data

Requirements

Assumptions and Constraints

P-Value Meaning in Checkout A/B

Business Context

Problem Statement

Given Data

Requirements

Assumptions and Constraints

Your Answer

P-Value Meaning in Checkout A/B

Business Context

Problem Statement

Given Data

Requirements

Assumptions and Constraints

P-Value Meaning in Checkout A/B

Business Context

Problem Statement

Given Data

Requirements

Assumptions and Constraints

Your Answer