Context
The Facebook Feed team is testing a new comment composer treatment that makes it easier to leave a quick reaction-plus-comment on posts. Early internal results suggest it may increase commenting, but leadership wants a rigorous experiment before launch.
Hypothesis Seed
The product hypothesis is that reducing friction in the composer will increase meaningful comment creation per Feed viewer. However, the team is worried that a statistically significant lift in comments could still be a bad launch if it comes with lower session quality, more spammy interactions, or only a trivial gain that is not worth added complexity.
Constraints
- Eligible traffic: 12 million Facebook Feed viewers per day globally
- Maximum experiment duration: 14 days, because the team must make a launch decision before the next quarterly planning cycle
- Allocation target: 50/50 after a brief instrumentation ramp
- False positives are costly because low-quality engagement can hurt long-term Feed experience
- False negatives are also costly because comment creation is a strategic engagement goal
- The team will monitor safety metrics daily, but the primary readout must follow a pre-registered plan
Deliverables
- Define the null and alternative hypotheses, including whether the primary test should be one-sided or two-sided.
- Choose a primary metric, 2-4 guardrail metrics, and 1-3 secondary metrics. State the MDE explicitly and explain why a statistically significant result may still not justify shipping.
- Calculate the required sample size per arm using the provided baseline assumptions, and translate that into expected runtime given available traffic.
- Specify the experiment design: unit of randomization, allocation, duration, stratification, and how you will handle peeking, multiple comparisons, and any mismatch between unit of randomization and analysis.
- State a clear ship / don’t-ship / iterate rule that respects guardrails and addresses pitfalls such as novelty effects, sample ratio mismatch, and interference across users.
Use these planning assumptions for the primary metric: baseline meaningful comment creation rate = 8.0% per viewer-day, target MDE = +2.0% relative, alpha = 0.05, power = 80%.