BluePay uses a binary classification model to score card-not-present transactions in real time as approve or send-to-review. The team wants to deploy a new gradient boosting model into the live payments flow, replacing the current rules-based system.
Validation was run on 8.4M recent transactions with delayed fraud labels. The proposed model is evaluated at a review threshold of 0.62.
| Metric | Proposed Model | Current Production Rules | Target / Constraint |
|---|---|---|---|
| Precision | 0.74 | 0.51 | >= 0.70 |
| Recall | 0.81 | 0.68 | >= 0.78 |
| F1 Score | 0.77 | 0.58 | Maximize |
| AUC-ROC | 0.93 | 0.79 | >= 0.90 |
| Log Loss | 0.118 | 0.241 | Lower is better |
| Calibration error | 0.036 | 0.091 | <= 0.05 |
| Review rate | 1.9% | 3.4% | <= 2.2% |
| False positive rate | 0.47% | 1.62% | <= 0.60% |
| Estimated monthly fraud loss | $1.42M | $2.05M | Minimize |
The model looks strong offline, but BluePay must decide whether these results are sufficient for a live payments environment where false declines create customer friction and missed fraud creates direct financial loss.