
You are reviewing a machine learning model that has been trained and validated, and the team wants to know whether it is actually good enough to ship. The model will be used to make decisions that have real business impact, so a single score may not tell the full story.
What metrics would you use to evaluate the performance of a machine learning model?