
You work on a consumer product team running an A/B test for a small UI change on a high-traffic surface. The experiment shows a statistically significant lift, but the estimated effect size is very small and close to the noise floor. Your team is unsure whether to trust the result enough to ship.
How would you decide whether to trust an experiment result when the observed effect size is small? What evidence would you look for before recommending ship, don’t ship, or rerun?
Statistical significance vs practical significanceConfidence interval width and overlap with meaningful effect sizesPower and whether the test was designed for such a small liftExperiment validity checks such as SRM and peeking riskGuardrail performance before shipping