ShopNow uses two ML models in production: a classifier to detect fraudulent orders and a regression model to forecast weekly demand for top-selling products. Product leaders are asking which metric should be used to evaluate each model because recent reviews focused on a single score without considering business cost.
| Model | Metric | Current Value | Prior Model | Notes |
|---|---|---|---|---|
| Fraud classifier | Precision | 0.91 | 0.84 | High quality alerts |
| Fraud classifier | Recall | 0.58 | 0.76 | Many fraud cases missed |
| Fraud classifier | F1-score | 0.71 | 0.80 | Precision-recall imbalance |
| Fraud classifier | Accuracy | 0.992 | 0.989 | Fraud rate is only 0.8% |
| Demand forecast | RMSE | 18.4 units | 22.7 units | Lower is better |
| Demand forecast | MAE | 11.2 units | 13.5 units | Median SKU volume is 95 |
| Demand forecast | Bias | +6.1 units | +1.8 units | Systematic over-forecasting |
The fraud team says each false negative costs about $240 in chargebacks, while each false positive costs $8 in manual review and customer friction. The inventory team says over-forecasting creates holding cost, but under-forecasting causes stockouts and lost margin on high-demand SKUs.