ShopFlow uses a binary classifier to predict whether an order should be sent to manual fraud review before fulfillment. The customer is unhappy with current outcomes: analysts say too many reviewed orders are legitimate, while operations says some fraudulent orders are still shipping.
| Metric | Current Model @ 0.60 Threshold | Alternative @ 0.45 Threshold | Target / Constraint |
|---|---|---|---|
| Precision | 0.78 | 0.61 | Review team wants >0.70 |
| Recall | 0.42 | 0.68 | Risk team wants >0.60 |
| F1 Score | 0.55 | 0.64 | Higher is better |
| AUC-ROC | 0.86 | 0.86 | Model ranking unchanged |
| False Positive Rate | 0.9% | 2.1% | Minimize customer friction |
| Orders flagged/day | 1,150 | 2,050 | Team capacity: 1,600/day |
| Fraud prevalence | 1.8% | 1.8% | Stable |
| Score calibration error | 0.11 | 0.11 | Lower is better |
The customer asks whether they should tune the decision threshold, redesign the review workflow, or accept the current model behavior because the model itself may already be near its practical limit. You need to interpret the metrics and recommend the best path.