Diagnose Suspicious Experiment Data

Context

StreamCart, a grocery delivery app, is testing a redesigned checkout page intended to reduce friction and increase order completion. Three days into the experiment, the treatment shows a large conversion lift on mobile Safari, a drop on Android, and an unexpected 56/44 traffic split instead of 50/50.

Hypothesis Seed

The new checkout design shortens the form and highlights saved payment methods. The product team believes it will improve checkout completion without increasing payment failures or customer support contacts. Because the data already looks suspicious, the key challenge is not only designing the experiment, but deciding how to handle inconsistent or potentially invalid results.

Constraints

Eligible traffic: 120,000 checkout starts per day
Maximum runtime: 14 days; leadership needs a decision before a seasonal campaign
Planned allocation: 50/50 after a 5% canary for instrumentation checks
Baseline checkout completion rate: 38%
Small false positives are costly because a broken checkout directly impacts revenue; false negatives are acceptable if they avoid shipping a buggy experience

Deliverables

Define the hypothesis, primary metric, guardrails, and a realistic MDE for this test.
Calculate the required sample size and whether the test can be completed within 14 days.
Choose the unit of randomization and explain how you would investigate suspicious data before trusting the result (for example SRM, logging bugs, or segment-specific anomalies).
Pre-register the analysis plan: test choice, peeking policy, multiple-comparison handling, and what to do if data quality checks fail.
Provide a clear ship / don’t-ship / investigate decision rule that respects guardrails and explicitly handles the case where the experiment is statistically significant but the data appears inconsistent.

Context

Hypothesis Seed

Constraints

Eligible traffic: 120,000 checkout starts per day
Maximum runtime: 14 days; leadership needs a decision before a seasonal campaign
Planned allocation: 50/50 after a 5% canary for instrumentation checks
Baseline checkout completion rate: 38%
Small false positives are costly because a broken checkout directly impacts revenue; false negatives are acceptable if they avoid shipping a buggy experience

Deliverables

Define the hypothesis, primary metric, guardrails, and a realistic MDE for this test.
Calculate the required sample size and whether the test can be completed within 14 days.
Choose the unit of randomization and explain how you would investigate suspicious data before trusting the result (for example SRM, logging bugs, or segment-specific anomalies).
Pre-register the analysis plan: test choice, peeking policy, multiple-comparison handling, and what to do if data quality checks fail.
Provide a clear ship / don’t-ship / investigate decision rule that respects guardrails and explicitly handles the case where the experiment is statistically significant but the data appears inconsistent.

Context

Hypothesis Seed

Constraints

Eligible traffic: 120,000 checkout starts per day
Maximum runtime: 14 days; leadership needs a decision before a seasonal campaign
Planned allocation: 50/50 after a 5% canary for instrumentation checks
Baseline checkout completion rate: 38%
Small false positives are costly because a broken checkout directly impacts revenue; false negatives are acceptable if they avoid shipping a buggy experience

Deliverables

Define the hypothesis, primary metric, guardrails, and a realistic MDE for this test.
Calculate the required sample size and whether the test can be completed within 14 days.
Choose the unit of randomization and explain how you would investigate suspicious data before trusting the result (for example SRM, logging bugs, or segment-specific anomalies).
Pre-register the analysis plan: test choice, peeking policy, multiple-comparison handling, and what to do if data quality checks fail.
Provide a clear ship / don’t-ship / investigate decision rule that respects guardrails and explicitly handles the case where the experiment is statistically significant but the data appears inconsistent.

Context

Hypothesis Seed

Constraints

Eligible traffic: 120,000 checkout starts per day
Maximum runtime: 14 days; leadership needs a decision before a seasonal campaign
Planned allocation: 50/50 after a 5% canary for instrumentation checks
Baseline checkout completion rate: 38%
Small false positives are costly because a broken checkout directly impacts revenue; false negatives are acceptable if they avoid shipping a buggy experience

Deliverables

Define the hypothesis, primary metric, guardrails, and a realistic MDE for this test.
Calculate the required sample size and whether the test can be completed within 14 days.
Choose the unit of randomization and explain how you would investigate suspicious data before trusting the result (for example SRM, logging bugs, or segment-specific anomalies).
Pre-register the analysis plan: test choice, peeking policy, multiple-comparison handling, and what to do if data quality checks fail.
Provide a clear ship / don’t-ship / investigate decision rule that respects guardrails and explicitly handles the case where the experiment is statistically significant but the data appears inconsistent.

Interview Guides

Context

Hypothesis Seed

Constraints

Deliverables

Diagnose Suspicious Experiment Data

Context

Hypothesis Seed

Constraints

Deliverables

Your Answer

Diagnose Suspicious Experiment Data

Context

Hypothesis Seed

Constraints

Deliverables

Diagnose Suspicious Experiment Data

Context

Hypothesis Seed

Constraints

Deliverables

Your Answer