SureShield Insurance built a binary classification model to predict whether a newly submitted claim will become a high-cost fraudulent claim requiring special investigation. The dataset is highly imbalanced: only 1.8% of historical claims are labeled fraud. The team initially optimized for accuracy, but fraud losses remain high and investigators say too many risky claims are being missed.
| Metric | Validation Set | Notes |
|---|---|---|
| Accuracy | 0.972 | High due to class imbalance |
| Precision | 0.41 | 41% of flagged claims are actually fraud |
| Recall | 0.29 | Model catches less than one-third of fraud cases |
| F1 Score | 0.34 | Weak balance between precision and recall |
| AUC-ROC | 0.86 | Good ranking overall |
| Log Loss | 0.118 | Probabilities are moderately informative |
| Fraud rate | 1.8% | 1,800 fraud cases in 100,000 claims |
| Claims flagged for review | 1,275 | Limited by investigation capacity |
The VP of Claims wants a recommendation on which evaluation metric should be the primary decision metric for model selection and threshold tuning. The answer must reflect the severe class imbalance and the business cost asymmetry: a missed fraudulent claim costs about $12,000 on average, while investigating a legitimate claim costs about $85.