BlueLedger uses a gradient-boosted classifier to predict the probability that a card payment will become a confirmed fraud or chargeback within 60 days. Scores above 0.80 are auto-declined, scores from 0.50 to 0.80 go to manual review, and lower scores are approved.
The model still ranks transactions reasonably well, but risk leaders are concerned that the predicted probabilities are not trustworthy enough for a high-stakes payments decision. In the last quarter, several score bands materially underpredicted realized fraud rates, creating avoidable losses and inconsistent review policies.
| Metric | Current Model | Prior Quarter | Change |
|---|---|---|---|
| AUC-ROC | 0.91 | 0.92 | -0.01 |
| Log Loss | 0.184 | 0.156 | +0.028 |
| Brier Score | 0.061 | 0.047 | +0.014 |
| Expected Calibration Error (ECE) | 0.072 | 0.031 | +0.041 |
| Precision @ auto-decline threshold | 0.88 | 0.90 | -0.02 |
| Recall @ auto-decline threshold | 0.41 | 0.44 | -0.03 |
| Monthly fraud loss | $4.8M | $3.9M | +$0.9M |
| Predicted Score Band | Share of Txns | Avg Predicted Risk | Observed Fraud Rate |
|---|---|---|---|
| 0.00-0.10 | 72% | 0.03 | 0.02 |
| 0.10-0.30 | 18% | 0.19 | 0.11 |
| 0.30-0.50 | 6% | 0.39 | 0.21 |
| 0.50-0.80 | 3% | 0.64 | 0.58 |
| 0.80-1.00 | 1% | 0.91 | 0.97 |