LedgerFlow uses a gradient-boosted binary classifier to flag card transactions for post-authorization fraud review. The AI audit team found that a specific $1,240 electronics purchase was flagged as high risk, but the customer claims it was legitimate and wants a clear explanation.
| Metric | Validation Set | Last 30 Days Production |
|---|---|---|
| Precision | 0.78 | 0.74 |
| Recall | 0.69 | 0.81 |
| F1 Score | 0.73 | 0.77 |
| AUC-ROC | 0.91 | 0.89 |
| Log Loss | 0.29 | 0.34 |
| False Positive Rate | 0.021 | 0.034 |
| Calibration Error | 0.04 | 0.11 |
| Review Threshold | 0.65 | 0.65 |
For the disputed transaction, the model score was 0.87. Top contributing signals shown in the audit tool were: new device (+0.19), merchant-country mismatch (+0.16), transaction amount 3.4x user median (+0.14), two declined attempts in prior 10 minutes (+0.11), and shipping address changed same day (+0.09).
You need to assess whether the flag was reasonable, explain how to communicate the decision to the customer without overstating model certainty, and recommend how to evaluate whether the audit explanation system is trustworthy.