You work on a ride-sharing marketplace product where a newly launched rider app feature shows a strong early lift in engagement after being exposed in an A/B test. The team is excited by the initial results, but you suspect users may simply be reacting to something new rather than adopting behavior that will persist.
How would you analyze whether the observed lift is a novelty effect rather than a durable product improvement? What experiment design and readout would you use to decide whether the feature should ship broadly?
Primary metric should reflect steady-state behavior, not launch-week excitement.Guardrails should protect core rider outcomes and app quality.Secondary cuts should help diagnose decay over time and by rider tenure.Novelty can inflate early effects.Peeking can turn a temporary spike into a false ship decision.SRM or logging bugs can mimic treatment effects.User-level randomization may still face marketplace interference.