Choose Metrics for Business Impact

Context

ShopNow uses two ML models in production: a classifier to detect fraudulent orders and a regression model to forecast weekly demand for top-selling products. Product leaders are asking which metric should be used to evaluate each model because recent reviews focused on a single score without considering business cost.

Current Performance

Model	Metric	Current Value	Prior Model	Notes
Fraud classifier	Precision	0.91	0.84	High quality alerts
Fraud classifier	Recall	0.58	0.76	Many fraud cases missed
Fraud classifier	F1-score	0.71	0.80	Precision-recall imbalance
Fraud classifier	Accuracy	0.992	0.989	Fraud rate is only 0.8%
Demand forecast	RMSE	18.4 units	22.7 units	Lower is better
Demand forecast	MAE	11.2 units	13.5 units	Median SKU volume is 95
Demand forecast	Bias	+6.1 units	+1.8 units	Systematic over-forecasting

The Problem

The fraud team says each false negative costs about $240 in chargebacks, while each false positive costs $8 in manual review and customer friction. The inventory team says over-forecasting creates holding cost, but under-forecasting causes stockouts and lost margin on high-demand SKUs.

Requirements

Explain when precision, recall, and F1-score should be prioritized for the fraud model.
Explain why accuracy is misleading in this setting.
Explain when RMSE is the right metric for the demand model versus classification metrics.
Recommend the primary metric for each business problem and justify it using the cost structure.
Suggest one follow-up analysis for threshold tuning or error segmentation.

Constraints

Fraud review team can handle only 1,200 alerts/day.
Fraud prevalence is low and varies by season.
Demand forecast errors on top 5% of SKUs matter more than long-tail items.

Context

Current Performance

Model	Metric	Current Value	Prior Model	Notes
Fraud classifier	Precision	0.91	0.84	High quality alerts
Fraud classifier	Recall	0.58	0.76	Many fraud cases missed
Fraud classifier	F1-score	0.71	0.80	Precision-recall imbalance
Fraud classifier	Accuracy	0.992	0.989	Fraud rate is only 0.8%
Demand forecast	RMSE	18.4 units	22.7 units	Lower is better
Demand forecast	MAE	11.2 units	13.5 units	Median SKU volume is 95
Demand forecast	Bias	+6.1 units	+1.8 units	Systematic over-forecasting

The Problem

Requirements

Explain when precision, recall, and F1-score should be prioritized for the fraud model.
Explain why accuracy is misleading in this setting.
Explain when RMSE is the right metric for the demand model versus classification metrics.
Recommend the primary metric for each business problem and justify it using the cost structure.
Suggest one follow-up analysis for threshold tuning or error segmentation.

Constraints

Fraud review team can handle only 1,200 alerts/day.
Fraud prevalence is low and varies by season.
Demand forecast errors on top 5% of SKUs matter more than long-tail items.

Context

Current Performance

Model	Metric	Current Value	Prior Model	Notes
Fraud classifier	Precision	0.91	0.84	High quality alerts
Fraud classifier	Recall	0.58	0.76	Many fraud cases missed
Fraud classifier	F1-score	0.71	0.80	Precision-recall imbalance
Fraud classifier	Accuracy	0.992	0.989	Fraud rate is only 0.8%
Demand forecast	RMSE	18.4 units	22.7 units	Lower is better
Demand forecast	MAE	11.2 units	13.5 units	Median SKU volume is 95
Demand forecast	Bias	+6.1 units	+1.8 units	Systematic over-forecasting

The Problem

Requirements

Explain when precision, recall, and F1-score should be prioritized for the fraud model.
Explain why accuracy is misleading in this setting.
Explain when RMSE is the right metric for the demand model versus classification metrics.
Recommend the primary metric for each business problem and justify it using the cost structure.
Suggest one follow-up analysis for threshold tuning or error segmentation.

Constraints

Fraud review team can handle only 1,200 alerts/day.
Fraud prevalence is low and varies by season.
Demand forecast errors on top 5% of SKUs matter more than long-tail items.

Context

Current Performance

Model	Metric	Current Value	Prior Model	Notes
Fraud classifier	Precision	0.91	0.84	High quality alerts
Fraud classifier	Recall	0.58	0.76	Many fraud cases missed
Fraud classifier	F1-score	0.71	0.80	Precision-recall imbalance
Fraud classifier	Accuracy	0.992	0.989	Fraud rate is only 0.8%
Demand forecast	RMSE	18.4 units	22.7 units	Lower is better
Demand forecast	MAE	11.2 units	13.5 units	Median SKU volume is 95
Demand forecast	Bias	+6.1 units	+1.8 units	Systematic over-forecasting

The Problem

Requirements

Explain when precision, recall, and F1-score should be prioritized for the fraud model.
Explain why accuracy is misleading in this setting.
Explain when RMSE is the right metric for the demand model versus classification metrics.
Recommend the primary metric for each business problem and justify it using the cost structure.
Suggest one follow-up analysis for threshold tuning or error segmentation.

Constraints

Fraud review team can handle only 1,200 alerts/day.
Fraud prevalence is low and varies by season.
Demand forecast errors on top 5% of SKUs matter more than long-tail items.

Interview Guides

Context

Current Performance

The Problem

Requirements

Constraints

Choose Metrics for Business Impact

Context

Current Performance

The Problem

Requirements

Constraints

Your Answer

Choose Metrics for Business Impact

Context

Current Performance

The Problem

Requirements

Constraints

Choose Metrics for Business Impact

Context

Current Performance

The Problem

Requirements

Constraints

Your Answer