BCG Digital Ventures has deployed a binary classifier in a claims triage workflow to identify insurance claims that should be escalated for manual fraud review. The current model is a gradient boosted tree classifier used in production scoring, but fraud operations reports that too many suspicious claims are still being approved automatically.
| Metric | Validation Set | Prior Model | Change |
|---|---|---|---|
| Precision | 0.84 | 0.76 | +0.08 |
| Recall | 0.58 | 0.71 | -0.13 |
| F1-score | 0.69 | 0.73 | -0.04 |
| ROC-AUC | 0.87 | 0.84 | +0.03 |
| Review rate | 6.2% | 9.8% | -3.6 pts |
| Fraud prevalence | 4.0% | 4.0% | 0.0 pts |
On a validation sample of 50,000 claims:
| Predicted Fraud | Predicted Non-Fraud | |
|---|---|---|
| Actual Fraud | 1,160 | 840 |
| Actual Non-Fraud | 221 | 47,779 |
Leadership wants to know whether this model is actually better than the prior version and whether the operating threshold is appropriate. The metrics appear mixed: precision and ROC-AUC improved, but recall and F1-score declined materially.