Guardrails for Feature Rollout Test

Context

StreamCart, a grocery delivery app, is testing a new "Smart Reorder" widget on the home screen that suggests frequently purchased items. Product leadership wants to know not only whether the widget improves reorder conversion, but also what guardrails must be in place before a broader rollout.

Hypothesis Seed

The widget is expected to reduce friction for repeat buyers and increase weekly reorder rate. However, it could also create unintended harm by slowing app performance, cannibalizing discovery of higher-margin items, or increasing support contacts if recommendations look wrong.

Constraints

Eligible traffic: 240,000 returning users per day
Maximum experiment duration: 14 days
Rollout decision required at the end of the test window
False positives are costly because a bad rollout could hurt revenue and trust at scale
False negatives are acceptable up to a point; the team would rather miss a small win than ship a harmful experience

Task

Define the experiment hypothesis, the primary success metric, and 2-4 guardrail metrics you would pre-register for this rollout.
Choose an explicit MDE, calculate the required sample size, and determine whether the test can be completed within 14 days.
Specify the unit of randomization, allocation plan, duration, and any stratification or blocking you would use.
Write the analysis plan: statistical test, peeking policy, multiple-comparison treatment, and how you will validate experiment integrity.
State a clear ship / don't-ship / iterate decision rule that respects both the primary metric and the guardrails.

Assume the baseline weekly reorder conversion among eligible returning users is 18.0%. Historical data suggests each eligible user either places at least one reorder during the 7-day observation window or does not. You may assume a two-sided test with 5% significance and 80% power unless you justify a different choice.

Context

Hypothesis Seed

Constraints

Eligible traffic: 240,000 returning users per day
Maximum experiment duration: 14 days
Rollout decision required at the end of the test window
False positives are costly because a bad rollout could hurt revenue and trust at scale
False negatives are acceptable up to a point; the team would rather miss a small win than ship a harmful experience

Task

Define the experiment hypothesis, the primary success metric, and 2-4 guardrail metrics you would pre-register for this rollout.
Choose an explicit MDE, calculate the required sample size, and determine whether the test can be completed within 14 days.
Specify the unit of randomization, allocation plan, duration, and any stratification or blocking you would use.
Write the analysis plan: statistical test, peeking policy, multiple-comparison treatment, and how you will validate experiment integrity.
State a clear ship / don't-ship / iterate decision rule that respects both the primary metric and the guardrails.

Context

Hypothesis Seed

Constraints

Eligible traffic: 240,000 returning users per day
Maximum experiment duration: 14 days
Rollout decision required at the end of the test window
False positives are costly because a bad rollout could hurt revenue and trust at scale
False negatives are acceptable up to a point; the team would rather miss a small win than ship a harmful experience

Task

Define the experiment hypothesis, the primary success metric, and 2-4 guardrail metrics you would pre-register for this rollout.
Choose an explicit MDE, calculate the required sample size, and determine whether the test can be completed within 14 days.
Specify the unit of randomization, allocation plan, duration, and any stratification or blocking you would use.
Write the analysis plan: statistical test, peeking policy, multiple-comparison treatment, and how you will validate experiment integrity.
State a clear ship / don't-ship / iterate decision rule that respects both the primary metric and the guardrails.

Context

Hypothesis Seed

Constraints

Eligible traffic: 240,000 returning users per day
Maximum experiment duration: 14 days
Rollout decision required at the end of the test window
False positives are costly because a bad rollout could hurt revenue and trust at scale
False negatives are acceptable up to a point; the team would rather miss a small win than ship a harmful experience

Task

Define the experiment hypothesis, the primary success metric, and 2-4 guardrail metrics you would pre-register for this rollout.
Choose an explicit MDE, calculate the required sample size, and determine whether the test can be completed within 14 days.
Specify the unit of randomization, allocation plan, duration, and any stratification or blocking you would use.
Write the analysis plan: statistical test, peeking policy, multiple-comparison treatment, and how you will validate experiment integrity.
State a clear ship / don't-ship / iterate decision rule that respects both the primary metric and the guardrails.

Interview Guides

Context

Hypothesis Seed

Constraints

Task

Guardrails for Feature Rollout Test

Context

Hypothesis Seed

Constraints

Task

Your Answer

Guardrails for Feature Rollout Test

Context

Hypothesis Seed

Constraints

Task

Guardrails for Feature Rollout Test

Context

Hypothesis Seed

Constraints

Task

Your Answer