Context
The Instagram Reels team is testing a new save-forwarding UI: after a user taps IG Save on a Reel, the app briefly surfaces a richer confirmation tray with related Reels and a stronger visual reward. PMs believe this could improve the AARRR activation-to-retention path by making saving feel more valuable, but they are concerned that any early lift may be driven by novelty effect / primacy effect rather than durable behavior change.
Hypothesis Seed
The treatment may increase the probability that a Reel viewer saves at least one Reel in a day because the new UI is more salient and rewarding. However, users may respond strongly only when the experience is new, causing an inflated week-1 lift that fades by week 2. You should design the experiment to measure both the immediate effect and whether the effect persists after novelty wears off.
Constraints
- Eligible traffic: 6,000,000 daily active Reels viewers globally
- Only 20% of eligible traffic can be exposed initially due to UX risk; assume a final steady-state allocation of 10% control / 10% treatment / 80% holdout from this launch candidate
- Maximum decision window: 14 days
- False positives are costly because shipping a novelty-only UI adds product complexity and may crowd the Reels viewer
- False negatives are acceptable up to a point; the team only wants to ship if the effect is durable and meaningful
- Meta experimentation stack supports CUPED using 14-day pre-experiment user history and standard SRM monitoring
Deliverables
- Define novelty effect and primacy effect in this Meta scenario, and state a testable hypothesis that separates short-term lift from durable lift.
- Choose the primary metric, 2-4 guardrails, and at least one secondary metric using Meta vocabulary (for example IG Save, Reels engagement, retention in the AARRR funnel).
- Compute the required sample size with an explicit MDE, then translate it into runtime under the traffic constraints.
- Specify unit of randomization, allocation, duration, CUPED usage, and a pre-registered analysis plan including peeking and multiple-comparisons policy.
- State a ship / do-not-ship / iterate rule that respects guardrails and explains what you would do if week 1 is positive but week 2 decays materially.