You own a gradient-boosted probability-of-default model used in a digital lending flow to price unsecured personal loans and trigger manual review for applications with predicted default risk above 8%. The model was trained on two years of historical application and repayment data and approved for production after strong holdout performance. Three months after launch, finance partners report that booked loans are defaulting more often than forecast, especially in newer applicant segments, even though the model's ranking metrics still look acceptable. You are asked how you would validate whether the model is still reliable enough for underwriting and pricing decisions.
| Metric | Offline Validation | Last 90 Days Production |
|---|---|---|
| AUC-ROC | 0.81 | 0.79 |
| Log Loss | 0.41 | 0.49 |
| Brier Score | 0.118 | 0.156 |
| MAE (predicted PD vs observed default rate by decile) | 1.9 pp | 4.8 pp |
| RMSE (predicted PD vs observed default rate by decile) | 2.6 pp | 6.7 pp |
| Avg predicted default rate | 6.2% | 6.5% |
| Actual default rate | 6.4% | 9.1% |
| Approval rate | 58% | 61% |
How would you validate this model end to end given these results, and what would you recommend changing before relying on it for ongoing lending decisions?