
You are evaluating a classification model for a finance use case where the positive class is rare. Accuracy looks high, but the team is not confident it reflects real performance on the minority class. The business wants to know which metrics matter most and how to judge whether the current threshold is appropriate.
How would you evaluate a classification model on an imbalanced dataset common in finance?