Context
The Instagram Reels ranking team has launched an A/B test in Meta’s experimentation platform for a new Reels ranker. Early reads show +3% watch time in treatment, and leadership wants to know whether this is a real product win or just noise.
Hypothesis Seed
The new ranker reorders IG Reels using updated engagement features, including stronger weighting on predicted long-watch and IG Save propensity. The team believes this should improve the AARRR engagement layer for existing users by increasing Reels watch time without hurting downstream quality signals such as session exits, negative feedback, or creator ecosystem health.
Constraints
- Eligible traffic: 12M daily active Reels viewers/day globally
- Max experiment duration: 14 days; a ship/no-ship decision is required by then
- Initial ramp: 5% for 1 day, then 50/50 split if no severe issues
- False positives are costly because a bad ranker can degrade user experience at Meta scale; false negatives are also costly because ranking launches are expensive and delayed launches slow Reels growth
- You may use CUPED with 7-day pre-experiment watch-time data if you justify it
- Assume baseline daily per-user Reels watch time is 24.0 minutes, with standard deviation 60 minutes due to heavy-tailed usage
Task
- State the null and alternative hypotheses for whether the observed +3% watch-time lift is real, and define the primary metric, guardrails, and at least one secondary metric using Meta vocabulary.
- Choose the unit of randomization and unit of analysis, explain whether they should match, and discuss risks such as network interference, SUTVA violations, and novelty/primacy effects on IG Reels.
- Compute the required sample size for a pre-registered MDE, translate it into runtime using the traffic above, and explain whether a 14-day test is sufficient. If you use CUPED, state how it changes variance assumptions.
- Write the analysis plan: test statistic, confidence interval, peeking policy, multiple-comparison policy for guardrails/secondaries, and how you will check for Sample Ratio Mismatch (SRM).
- Give a clear ship / don’t ship / iterate decision rule for the case where watch time is up 3% but one or more guardrails move in the wrong direction.