
You are reviewing a product metric change after a launch or experiment. The team sees a difference between two groups and wants to know whether to trust the raw comparison or run a formal test.
How do you decide whether to use a simple metric comparison or a more rigorous statistical test?