You work on a consumer fintech product and run an A/B test on a new growth surface that produces a small positive lift on the headline conversion metric. The result may be statistically significant, and some stakeholders want to ship it, but you are concerned the gain may not justify rollout once user impact, trade-offs, and experiment quality are considered.
How would you explain why a test with a small lift might still not be worth shipping? What would you look at beyond statistical significance before making a launch recommendation?