FinSure uses a binary classification model to detect potentially fraudulent card transactions before authorization. The current model has reduced chargeback losses, but the risk team believes it is still missing too many fraud cases while sending too many legitimate transactions to manual review.
The model scores 5,000,000 transactions per month. Fraud prevalence is 0.8% (40,000 actual fraud cases). At the current decision threshold of 0.60, the offline holdout results are:
| Metric | Value |
|---|---|
| Precision | 0.62 |
| Recall | 0.54 |
| F1 Score | 0.58 |
| AUC-ROC | 0.91 |
| PR-AUC | 0.49 |
| False Positive Rate | 0.50% |
| Transactions flagged for review | 34,839 |
| Fraud caught | 21,600 |
| Fraud missed | 18,400 |
Manual review capacity is limited, and each false positive creates customer friction. However, each missed fraud event is expensive. The VP of Risk wants to know whether the model is actually performing well for this use case, whether the threshold is appropriate, and what should be improved before the next release.