You work on a consumer product team running an A/B test for a small UI change on a high-traffic surface. The experiment shows a statistically significant lift, but the estimated effect size is very small and close to the noise floor. Your team is unsure whether to trust the result enough to ship.
How would you decide whether to trust an experiment result when the observed effect size is small? What evidence would you look for before recommending ship, don’t ship, or rerun?