Context
The Instagram Reels team tested a new ranking feature that adds a lightweight “save-worthy” prior to the feed ranker, with the goal of increasing IG Save on Reels. The first readout from Meta’s experimentation platform shows a statistically significant lift, but the effect size is tiny.
Hypothesis Seed
Product believes surfacing more “referenceable” or “re-watchable” Reels will improve the AARRR engagement funnel for retained users by increasing saves, with possible downstream gains in session depth and 7-day retention. However, a tiny win may not justify ranking complexity, infra cost, or any harm to watch time quality.
Constraints
- Eligible traffic: 12M daily Reels viewers globally
- Max experiment window: 14 days before the Reels ranking roadmap decision
- Randomization must happen in Meta’s standard experiment framework using a persistent treatment assignment
- False positives are costly because ranking changes are expensive to maintain and can degrade user experience at scale
- False negatives are also meaningful because even small engagement lifts can matter on Reels, but only if they clear practical significance and guardrails
- You may use CUPED with 14-day pre-experiment user-level save behavior to reduce variance
Task
- Define the experiment: hypothesis, primary metric, guardrails, secondary metrics, unit of randomization, and explicit MDE.
- Compute the required sample size and expected duration using real numbers. Show how CUPED would affect variance and required N.
- Write a pre-registered analysis plan covering the significance test, peeking policy, multiple comparisons, and how you would check for Sample Ratio Mismatch (SRM).
- Explain how you would interpret a result that is statistically significant but below the pre-registered MDE, including whether to ship, iterate, or run a follow-up.
- Name the main pitfalls for this Reels test, including Novelty Effect (primacy effect) and any network or interference risks relevant to Meta surfaces such as Instagram Reels or Facebook Groups sharing.