You are evaluating a binary classifier that predicts whether a client will default within 12 months so the risk team can trigger early intervention. The model is a gradient-boosted tree that outputs a probability score, and accounts above a 0.60 threshold are flagged for outreach. In offline validation, stakeholders focused on overall accuracy, but the portfolio manager is now asking whether the model is actually useful for prioritizing high-risk accounts. You need to explain what precision, recall, and ROC-AUC say about performance and whether the current threshold is appropriate.
| Metric | Validation Set |
|---|---|
| Positive class prevalence | 8.0% |
| Accuracy | 94.0% |
| Precision | 0.52 |
| Recall | 0.39 |
| F1 Score | 0.45 |
| ROC-AUC | 0.86 |
| Accounts flagged at threshold 0.60 | 600 of 10,000 |
| True defaults in sample | 800 of 10,000 |
How would you evaluate this model using precision, recall, and ROC-AUC, and what would you recommend to the risk team about whether to keep or adjust the current decision threshold?