Abzooba has built a binary classifier in its customer support workflow to predict whether an incoming enterprise support ticket should be escalated to a senior resolution team. The current model is used in production to prioritize tickets in Abzooba’s support intelligence stack, but stakeholders disagree on whether accuracy, precision, recall, or F1 should be the primary success metric.
The model was evaluated on 10,000 labeled tickets, of which 1,200 truly required escalation.
| Metric | Value |
|---|---|
| Accuracy | 0.90 |
| Precision | 0.68 |
| Recall | 0.57 |
| F1 Score | 0.62 |
| Escalation rate predicted by model | 10.0% |
| Actual escalation rate | 12.0% |
Confusion matrix counts:
| Predicted Escalate | Predicted Do Not Escalate | |
|---|---|---|
| Actual Escalate | 684 | 516 |
| Actual Do Not Escalate | 316 | 8,484 |
The Head of Support points to 90% accuracy and argues the model is strong. The senior operations manager is more concerned that 516 high-priority tickets are missed, causing SLA breaches and delayed enterprise responses. Meanwhile, the support team also wants to avoid too many unnecessary escalations because senior agents are capacity constrained.