LendWise has trained a binary classification model to pre-approve personal loan applications. The model is intended to reduce manual underwriting volume while keeping default risk within policy limits. A pilot on recent applications shows strong aggregate accuracy, but risk and operations teams disagree on whether the model is ready to ship.
| Metric | Validation Set | Pilot Holdout | Ship Target |
|---|---|---|---|
| Accuracy | 0.91 | 0.89 | >= 0.88 |
| Precision (approved loans that stay current) | 0.93 | 0.90 | >= 0.92 |
| Recall (good borrowers approved) | 0.78 | 0.72 | >= 0.75 |
| F1 Score | 0.85 | 0.80 | >= 0.83 |
| AUC-ROC | 0.94 | 0.90 | >= 0.91 |
| Log Loss | 0.21 | 0.29 | <= 0.25 |
| Calibration error | 0.03 | 0.08 | <= 0.05 |
| Manual review rate | 18% | 27% | <= 20% |
| 90-day default rate on approved loans | 3.1% | 4.8% | <= 4.0% |
The pilot suggests the model generalizes worse than offline validation, especially on recall, calibration, and downstream default rate. Leadership wants a recommendation on whether the model is good enough to ship now, ship behind guardrails, or hold for improvement.