ShopLens is evaluating two binary classification models that predict whether a customer support ticket should be escalated to a human specialist. Escalations are expensive, but missing urgent tickets leads to SLA breaches and customer churn. The team wants to choose between two models with different precision and recall profiles.
Evaluation was run on a holdout set of 10,000 tickets, with 1,000 truly urgent tickets and 9,000 non-urgent tickets.
| Metric | Model A | Model B |
|---|---|---|
| Precision | 0.91 | 0.68 |
| Recall | 0.54 | 0.86 |
| F1 Score | 0.68 | 0.76 |
| Accuracy | 0.94 | 0.90 |
| False Positives | 53 | 405 |
| False Negatives | 460 | 140 |
| Predicted Positive Tickets | 593 | 1,265 |
Model A is much more precise but misses many urgent tickets. Model B catches most urgent tickets but sends many more non-urgent tickets to specialists. The hiring manager wants to know how you would compare these models, which one you would recommend, and whether threshold tuning could produce a better operating point.