Context
The Instagram Reels team wants to test a new ranking change that is expected to increase short-term engagement, especially IG Save rate, but leadership is worried that optimizing for immediate engagement could hurt long-term retention. You are asked to design an experiment with a persistent holdout to measure long-term effects before a broad launch.
Hypothesis Seed
The proposed Reels ranker surfaces more “save-worthy” content using a new prediction feature. The team believes this will improve the AARRR engagement funnel at the Activation/Retention stages: more users save Reels now, and a subset returns more often over the next 8 weeks. However, there is also risk of a novelty effect (primacy effect), creator ecosystem shifts, and spillovers through sharing.
Constraints
- Eligible population: 24M daily active Instagram users globally who watch at least 1 Reel/day
- Traffic budget: at most 10% of eligible users can be withheld from the new ranker as a long-term holdout because product leadership wants fast rollout
- Maximum initial readout: 14 days for short-term decisioning
- Long-term holdout measurement window: 8 weeks after launch
- False positives are costly because shipping a ranker that boosts short-term saves but harms retention would be hard to unwind
- Meta experimentation stack supports user-level randomization, CUPED using 28-day pre-experiment covariates, and automated SRM checks
Deliverables
- Define the null and alternative hypotheses for both the 14-day launch readout and the 8-week holdout readout.
- Specify the primary metric, secondary metrics, and guardrails, including an explicit MDE and why IG Save should or should not be the primary metric.
- Calculate the required sample size for the long-term holdout using real numbers, and translate it into feasibility given the available traffic.
- Choose the unit of randomization, allocation, and duration; explain how you would use CUPED and how you would pre-register the analysis plan.
- State a clear ship / don’t-ship / iterate rule that respects guardrails and addresses pitfalls such as peeking, novelty effect, SRM, and network interference.