You are the DS owner of a gradient-boosted binary classifier that predicts default risk for a digital consumer lending product. Applicants with a score above a 0.40 threshold are automatically declined, while the rest are approved or sent to manual review. The business team is unhappy because portfolio losses fell only slightly after launch, but approved-loan volume dropped enough to hurt revenue. You are asked which metrics should drive evaluation in this business context and whether the current model is actually better than the previous policy.
| Metric | Previous Rule Policy | Current Model |
|---|---|---|
| Accuracy | 0.78 | 0.84 |
| Precision (default class) | 0.41 | 0.63 |
| Recall (default class) | 0.74 | 0.46 |
| F1 Score | 0.53 | 0.53 |
| AUC-ROC | 0.71 | 0.86 |
| Log Loss | 0.58 | 0.41 |
| Approval Rate | 68% | 52% |
| Default Rate on Approved Loans | 6.8% | 5.9% |
| Monthly Net Revenue | 4.2M | 4.0M |
| Manual Review Rate | 9% | 18% |
How would you evaluate this model in a business context, and which metrics would you prioritize over the others given the trade-off between credit losses, approval volume, and operational cost?