You are reviewing a classifier on a dataset where the positive class is rare, and the team is worried that standard evaluation can give a misleading picture of performance. You need to explain how you would judge whether the model is actually useful when most examples belong to the negative class.
How do you evaluate a model on imbalanced datasets?