Context
ShopFlow, a mid-size e-commerce app, launched a new post-purchase feature that lets customers rate delivery experience and receive tailored help content. Product leadership wants to know whether the feature improves customer satisfaction without reducing purchase conversion.
Hypothesis Seed
The team believes the feature increases customer satisfaction by making support feel faster and more personalized. However, the extra UI may distract users or slow checkout, so conversion must not be harmed.
Constraints
- Eligible traffic: 120,000 unique users per day who reach the checkout funnel
- Current purchase conversion baseline: 12.0% per eligible user
- Current satisfaction survey response rate: 18% of purchasers
- Current top-box satisfaction baseline among survey responders: 68%
- Maximum experiment duration: 21 days
- Business cost of a false positive is high: shipping a feature that hurts conversion is worse than missing a small satisfaction gain
- The team needs a clear ship / do-not-ship recommendation by the end of the test window
Task
- Define the experiment hypothesis, the primary metric for satisfaction, and guardrail metrics that protect conversion and user experience. Be explicit about the minimum detectable effect (MDE).
- Choose the unit of randomization and explain whether you would randomize by user, session, or another unit. State the allocation and test duration.
- Calculate the required sample size using the provided baselines. Show the math and determine whether the experiment is feasible within 21 days.
- Pre-register an analysis plan: statistical test, handling of multiple metrics, peeking policy, and what you would do if the unit of analysis differs from the unit of randomization.
- List the main risks to valid inference, including novelty effects, sample ratio mismatch, and any interference or SUTVA concerns, and explain how you would mitigate them.
Assume the feature is shown consistently to treated users across web and app once assigned, and that survey instrumentation is already reliable enough for experimentation use.