ShopShield runs a binary classification model that flags suspicious e-commerce orders for manual review before fulfillment. The customer success team says enterprise merchants keep focusing on the model's 96.2% accuracy, while complaining that too many legitimate orders are being flagged and delayed.
| Metric | Current Model | Previous Threshold |
|---|---|---|
| Accuracy | 96.2% | 97.1% |
| Precision | 41.7% | 55.4% |
| Recall | 78.1% | 61.3% |
| F1 Score | 54.3% | 58.0% |
| False Positive Rate | 3.4% | 1.8% |
| Orders flagged for review | 3,000 / 50,000 | 1,850 / 50,000 |
| Predicted Fraud | Predicted Legitimate | |
|---|---|---|
| Actual Fraud | 625 | 175 |
| Actual Legitimate | 2,375 | 46,825 |
A large merchant asks: "If your model is over 96% accurate, why are 2,375 good orders being flagged?" You need to explain this clearly, quantify the tradeoff, and recommend whether the current threshold is appropriate.