Context
The Facebook Feed team is testing a new comment composer prompt that appears above the comment box on public posts. The product manager believes the prompt will increase meaningful commenting, but leadership only wants to ship if the gain is large enough to matter for user value and does not hurt feed quality.
Hypothesis Seed
The treatment changes the prompt text from a neutral placeholder to a more action-oriented one (for example, encouraging users to “share what you think”). The team expects a small lift in comment creation rate, but this is a high-traffic surface, so even tiny effects may become statistically significant. Your task is to design an experiment that explicitly distinguishes statistical significance from practical significance.
Constraints
- Eligible traffic: 12 million Feed viewers per day globally
- Only 30% of viewers see at least one public post where the prompt can appear
- Maximum experiment duration: 14 days, after which the team must make a ship decision
- The team can allocate at most 50% of eligible traffic to treatment
- False positives are costly because low-quality comments can degrade Feed quality and moderation load
- False negatives are also costly because comments are a key engagement signal for creators
Deliverables
- Define the null and alternative hypotheses, and explain how practical significance differs from statistical significance in this setting.
- Choose a primary metric, 2-4 guardrail metrics, and an explicit MDE that would justify shipping.
- Calculate the required sample size and expected runtime using the provided traffic constraints.
- Specify the experiment design: unit of randomization, allocation, duration, stratification, and analysis plan, including peeking and multiple-comparison policy.
- State a clear ship / don’t-ship / iterate rule for these outcomes: (a) statistically significant and practically meaningful, (b) statistically significant but too small to matter, (c) not statistically significant, (d) guardrail breach.