Long-Term Holdout for IG Reels

Context

The Instagram Reels team wants to test a new ranking change that is expected to increase short-term engagement, especially IG Save rate, but leadership is worried that optimizing for immediate engagement could hurt long-term retention. You are asked to design an experiment with a persistent holdout to measure long-term effects before a broad launch.

Hypothesis Seed

The proposed Reels ranker surfaces more “save-worthy” content using a new prediction feature. The team believes this will improve the AARRR engagement funnel at the Activation/Retention stages: more users save Reels now, and a subset returns more often over the next 8 weeks. However, there is also risk of a novelty effect (primacy effect), creator ecosystem shifts, and spillovers through sharing.

Constraints

Eligible population: 24M daily active Instagram users globally who watch at least 1 Reel/day
Traffic budget: at most 10% of eligible users can be withheld from the new ranker as a long-term holdout because product leadership wants fast rollout
Maximum initial readout: 14 days for short-term decisioning
Long-term holdout measurement window: 8 weeks after launch
False positives are costly because shipping a ranker that boosts short-term saves but harms retention would be hard to unwind
Meta experimentation stack supports user-level randomization, CUPED using 28-day pre-experiment covariates, and automated SRM checks

Deliverables

Define the null and alternative hypotheses for both the 14-day launch readout and the 8-week holdout readout.
Specify the primary metric, secondary metrics, and guardrails, including an explicit MDE and why IG Save should or should not be the primary metric.
Calculate the required sample size for the long-term holdout using real numbers, and translate it into feasibility given the available traffic.
Choose the unit of randomization, allocation, and duration; explain how you would use CUPED and how you would pre-register the analysis plan.
State a clear ship / don’t-ship / iterate rule that respects guardrails and addresses pitfalls such as peeking, novelty effect, SRM, and network interference.

Context

Hypothesis Seed

Constraints

Eligible population: 24M daily active Instagram users globally who watch at least 1 Reel/day
Traffic budget: at most 10% of eligible users can be withheld from the new ranker as a long-term holdout because product leadership wants fast rollout
Maximum initial readout: 14 days for short-term decisioning
Long-term holdout measurement window: 8 weeks after launch
False positives are costly because shipping a ranker that boosts short-term saves but harms retention would be hard to unwind
Meta experimentation stack supports user-level randomization, CUPED using 28-day pre-experiment covariates, and automated SRM checks

Deliverables

Define the null and alternative hypotheses for both the 14-day launch readout and the 8-week holdout readout.
Specify the primary metric, secondary metrics, and guardrails, including an explicit MDE and why IG Save should or should not be the primary metric.
Calculate the required sample size for the long-term holdout using real numbers, and translate it into feasibility given the available traffic.
Choose the unit of randomization, allocation, and duration; explain how you would use CUPED and how you would pre-register the analysis plan.
State a clear ship / don’t-ship / iterate rule that respects guardrails and addresses pitfalls such as peeking, novelty effect, SRM, and network interference.

Context

Hypothesis Seed

Constraints

Eligible population: 24M daily active Instagram users globally who watch at least 1 Reel/day
Traffic budget: at most 10% of eligible users can be withheld from the new ranker as a long-term holdout because product leadership wants fast rollout
Maximum initial readout: 14 days for short-term decisioning
Long-term holdout measurement window: 8 weeks after launch
False positives are costly because shipping a ranker that boosts short-term saves but harms retention would be hard to unwind
Meta experimentation stack supports user-level randomization, CUPED using 28-day pre-experiment covariates, and automated SRM checks

Deliverables

Define the null and alternative hypotheses for both the 14-day launch readout and the 8-week holdout readout.
Specify the primary metric, secondary metrics, and guardrails, including an explicit MDE and why IG Save should or should not be the primary metric.
Calculate the required sample size for the long-term holdout using real numbers, and translate it into feasibility given the available traffic.
Choose the unit of randomization, allocation, and duration; explain how you would use CUPED and how you would pre-register the analysis plan.
State a clear ship / don’t-ship / iterate rule that respects guardrails and addresses pitfalls such as peeking, novelty effect, SRM, and network interference.

Context

Hypothesis Seed

Constraints

Eligible population: 24M daily active Instagram users globally who watch at least 1 Reel/day
Traffic budget: at most 10% of eligible users can be withheld from the new ranker as a long-term holdout because product leadership wants fast rollout
Maximum initial readout: 14 days for short-term decisioning
Long-term holdout measurement window: 8 weeks after launch
False positives are costly because shipping a ranker that boosts short-term saves but harms retention would be hard to unwind
Meta experimentation stack supports user-level randomization, CUPED using 28-day pre-experiment covariates, and automated SRM checks

Deliverables

Define the null and alternative hypotheses for both the 14-day launch readout and the 8-week holdout readout.
Specify the primary metric, secondary metrics, and guardrails, including an explicit MDE and why IG Save should or should not be the primary metric.
Calculate the required sample size for the long-term holdout using real numbers, and translate it into feasibility given the available traffic.
Choose the unit of randomization, allocation, and duration; explain how you would use CUPED and how you would pre-register the analysis plan.
State a clear ship / don’t-ship / iterate rule that respects guardrails and addresses pitfalls such as peeking, novelty effect, SRM, and network interference.

Interview Guides

Context

Hypothesis Seed

Constraints

Deliverables

Long-Term Holdout for IG Reels

Context

Hypothesis Seed

Constraints

Deliverables

Your Answer

Long-Term Holdout for IG Reels

Context

Hypothesis Seed

Constraints

Deliverables

Long-Term Holdout for IG Reels

Context

Hypothesis Seed

Constraints

Deliverables

Your Answer