StripeShield uses a binary classification model to flag card transactions for fraud review before approval. The current model has reduced chargebacks overall, but the Risk team is concerned that the threshold may be too conservative and is allowing costly fraud through.
The model was evaluated on 1,000,000 recent transactions with an observed fraud rate of 0.8%.
| Metric | Current Model |
|---|---|
| Precision | 0.64 |
| Recall | 0.40 |
| F1 Score | 0.49 |
| Accuracy | 0.994 |
| AUC-ROC | 0.91 |
| Fraud rate | 0.008 |
| Transactions flagged positive | 5,000 |
Confusion matrix counts:
| Predicted Fraud | Predicted Legitimate | |
|---|---|---|
| Actual Fraud | 3,200 | 4,800 |
| Actual Legitimate | 1,800 | 990,200 |
Each false negative results in an average fraud loss of $240. Each false positive sends a legitimate transaction to manual review or decline, costing $6 in operations and customer friction. The Head of Risk wants to know whether the model should be tuned for higher recall, even if precision drops.