Context
The Instagram Reels team at Meta wants to test a new ranking feature that surfaces more “save-worthy” Reels by upweighting predicted long-term value. The launch decision depends on whether the experiment improves downstream engagement without harming the broader AARRR funnel.
Hypothesis Seed
The team believes the new ranker will increase IG Save rate on Reels because users will see more collectible or revisit-worthy content. However, the change may also alter creator distribution, session depth, and social interactions, so the choice of unit of randomization is critical.
Constraints
- Eligible traffic: 18M daily Reels viewers globally
- Average eligible exposure: 2.4 Reels sessions per viewer per day
- Current baseline IG Save rate per viewer-day: 8.0%
- Maximum experiment duration: 14 days, including a 1-day ramp
- Decision deadline: must recommend ship / don’t ship / iterate by day 15
- False positives are costly because a bad ranker can degrade Reels retention and creator ecosystem health; false negatives are also costly because Reels is a strategic surface
- Meta experimentation platform supports user-level assignment, creator-level assignment, and geo holdouts; CUPED is available using 14-day pre-experiment viewer behavior
Deliverables
- Choose and justify the unit of randomization for this Reels experiment. Explicitly compare at least two alternatives (for example:
viewer_id, creator_id, or geo) and discuss SUTVA / network interference trade-offs.
- Define the primary metric, 2-4 guardrails, and at least one secondary metric using Meta vocabulary (for example IG Save, Reels session depth, retention, creator distribution).
- Calculate the required sample size for a pre-registered MDE and convert it into expected runtime given the traffic constraints. State how CUPED changes variance and effective runtime.
- Write the analysis plan: test choice, SRM checks, peeking policy, novelty effect handling, and a ship / don’t-ship rule that respects guardrails.
- Call out the top pitfalls specific to this design, especially when the unit of analysis differs from the unit of randomization.