Test Satisfaction Feature Safely

Context

ShopFlow, a mid-size e-commerce app, launched a new post-purchase feature that lets customers rate delivery experience and receive tailored help content. Product leadership wants to know whether the feature improves customer satisfaction without reducing purchase conversion.

Hypothesis Seed

The team believes the feature increases customer satisfaction by making support feel faster and more personalized. However, the extra UI may distract users or slow checkout, so conversion must not be harmed.

Constraints

Eligible traffic: 120,000 unique users per day who reach the checkout funnel
Current purchase conversion baseline: 12.0% per eligible user
Current satisfaction survey response rate: 18% of purchasers
Current top-box satisfaction baseline among survey responders: 68%
Maximum experiment duration: 21 days
Business cost of a false positive is high: shipping a feature that hurts conversion is worse than missing a small satisfaction gain
The team needs a clear ship / do-not-ship recommendation by the end of the test window

Task

Define the experiment hypothesis, the primary metric for satisfaction, and guardrail metrics that protect conversion and user experience. Be explicit about the minimum detectable effect (MDE).
Choose the unit of randomization and explain whether you would randomize by user, session, or another unit. State the allocation and test duration.
Calculate the required sample size using the provided baselines. Show the math and determine whether the experiment is feasible within 21 days.
Pre-register an analysis plan: statistical test, handling of multiple metrics, peeking policy, and what you would do if the unit of analysis differs from the unit of randomization.
List the main risks to valid inference, including novelty effects, sample ratio mismatch, and any interference or SUTVA concerns, and explain how you would mitigate them.

Assume the feature is shown consistently to treated users across web and app once assigned, and that survey instrumentation is already reliable enough for experimentation use.

Constraints

Eligible traffic: 120,000 unique users per day who reach the checkout funnel

Current purchase conversion baseline: 12.0% per eligible user

Current satisfaction survey response rate: 18% of purchasers

Current top-box satisfaction baseline among survey responders: 68%

Maximum experiment duration: 21 days

Business cost of a false positive is high: shipping a feature that hurts conversion is worse than missing a small satisfaction gain

The team needs a clear ship / do-not-ship recommendation by the end of the test window

Task

Define the experiment hypothesis, the primary metric for satisfaction, and guardrail metrics that protect conversion and user experience. Be explicit about the minimum detectable effect (MDE).

Choose the unit of randomization and explain whether you would randomize by user, session, or another unit. State the allocation and test duration.

Calculate the required sample size using the provided baselines. Show the math and determine whether the experiment is feasible within 21 days.

Pre-register an analysis plan: statistical test, handling of multiple metrics, peeking policy, and what you would do if the unit of analysis differs from the unit of randomization.

List the main risks to valid inference, including novelty effects, sample ratio mismatch, and any interference or SUTVA concerns, and explain how you would mitigate them.

Assume the feature is shown consistently to treated users across web and app once assigned, and that survey instrumentation is already reliable enough for experimentation use.

Constraints

Eligible traffic: 120,000 unique users per day who reach the checkout funnel

Current purchase conversion baseline: 12.0% per eligible user

Current satisfaction survey response rate: 18% of purchasers

Current top-box satisfaction baseline among survey responders: 68%

Maximum experiment duration: 21 days

Business cost of a false positive is high: shipping a feature that hurts conversion is worse than missing a small satisfaction gain

The team needs a clear ship / do-not-ship recommendation by the end of the test window

Task

Define the experiment hypothesis, the primary metric for satisfaction, and guardrail metrics that protect conversion and user experience. Be explicit about the minimum detectable effect (MDE).

Choose the unit of randomization and explain whether you would randomize by user, session, or another unit. State the allocation and test duration.

Calculate the required sample size using the provided baselines. Show the math and determine whether the experiment is feasible within 21 days.

Pre-register an analysis plan: statistical test, handling of multiple metrics, peeking policy, and what you would do if the unit of analysis differs from the unit of randomization.

List the main risks to valid inference, including novelty effects, sample ratio mismatch, and any interference or SUTVA concerns, and explain how you would mitigate them.

Assume the feature is shown consistently to treated users across web and app once assigned, and that survey instrumentation is already reliable enough for experimentation use.

Constraints

Eligible traffic: 120,000 unique users per day who reach the checkout funnel

Current purchase conversion baseline: 12.0% per eligible user

Current satisfaction survey response rate: 18% of purchasers

Current top-box satisfaction baseline among survey responders: 68%

Maximum experiment duration: 21 days

Business cost of a false positive is high: shipping a feature that hurts conversion is worse than missing a small satisfaction gain

The team needs a clear ship / do-not-ship recommendation by the end of the test window

Task

Define the experiment hypothesis, the primary metric for satisfaction, and guardrail metrics that protect conversion and user experience. Be explicit about the minimum detectable effect (MDE).

Choose the unit of randomization and explain whether you would randomize by user, session, or another unit. State the allocation and test duration.

Calculate the required sample size using the provided baselines. Show the math and determine whether the experiment is feasible within 21 days.

Pre-register an analysis plan: statistical test, handling of multiple metrics, peeking policy, and what you would do if the unit of analysis differs from the unit of randomization.

List the main risks to valid inference, including novelty effects, sample ratio mismatch, and any interference or SUTVA concerns, and explain how you would mitigate them.

Assume the feature is shown consistently to treated users across web and app once assigned, and that survey instrumentation is already reliable enough for experimentation use.

Interview Guides

Context

Hypothesis Seed

Constraints

Task

Test Satisfaction Feature Safely

Context

Hypothesis Seed

Constraints

Task

Your Answer

Test Satisfaction Feature Safely

Context

Hypothesis Seed

Constraints

Task

Test Satisfaction Feature Safely

Context

Hypothesis Seed

Constraints

Task

Your Answer