Choose Metrics for Ticket Escalation

Context

Abzooba has built a binary classifier in its customer support workflow to predict whether an incoming enterprise support ticket should be escalated to a senior resolution team. The current model is used in production to prioritize tickets in Abzooba’s support intelligence stack, but stakeholders disagree on whether accuracy, precision, recall, or F1 should be the primary success metric.

Current Performance

The model was evaluated on 10,000 labeled tickets, of which 1,200 truly required escalation.

Metric	Value
Accuracy	0.90
Precision	0.68
Recall	0.57
F1 Score	0.62
Escalation rate predicted by model	10.0%
Actual escalation rate	12.0%

Confusion matrix counts:

	Predicted Escalate	Predicted Do Not Escalate
Actual Escalate	684	516
Actual Do Not Escalate	316	8,484

The Problem

The Head of Support points to 90% accuracy and argues the model is strong. The senior operations manager is more concerned that 516 high-priority tickets are missed, causing SLA breaches and delayed enterprise responses. Meanwhile, the support team also wants to avoid too many unnecessary escalations because senior agents are capacity constrained.

Your Task

Explain how you would choose between accuracy, precision, recall, and F1 for this use case.
Interpret the current metrics in business terms, not just mathematically.
Identify which metric should be optimized first under the stated constraints.
Recommend what additional analysis you would run before changing the model or threshold.
Propose concrete next steps to improve performance.

Constraints

Senior escalation team can handle at most 1,300 tickets/day.
Missing a true escalation is estimated to cost 8x more than an unnecessary escalation.
Product leadership wants a single primary metric for model reporting.

Context

Current Performance

The model was evaluated on 10,000 labeled tickets, of which 1,200 truly required escalation.

Metric	Value
Accuracy	0.90
Precision	0.68
Recall	0.57
F1 Score	0.62
Escalation rate predicted by model	10.0%
Actual escalation rate	12.0%

Confusion matrix counts:

	Predicted Escalate	Predicted Do Not Escalate
Actual Escalate	684	516
Actual Do Not Escalate	316	8,484

The Problem

Your Task

Explain how you would choose between accuracy, precision, recall, and F1 for this use case.
Interpret the current metrics in business terms, not just mathematically.
Identify which metric should be optimized first under the stated constraints.
Recommend what additional analysis you would run before changing the model or threshold.
Propose concrete next steps to improve performance.

Constraints

Senior escalation team can handle at most 1,300 tickets/day.
Missing a true escalation is estimated to cost 8x more than an unnecessary escalation.
Product leadership wants a single primary metric for model reporting.

Context

Current Performance

The model was evaluated on 10,000 labeled tickets, of which 1,200 truly required escalation.

Metric	Value
Accuracy	0.90
Precision	0.68
Recall	0.57
F1 Score	0.62
Escalation rate predicted by model	10.0%
Actual escalation rate	12.0%

Confusion matrix counts:

	Predicted Escalate	Predicted Do Not Escalate
Actual Escalate	684	516
Actual Do Not Escalate	316	8,484

The Problem

Your Task

Explain how you would choose between accuracy, precision, recall, and F1 for this use case.
Interpret the current metrics in business terms, not just mathematically.
Identify which metric should be optimized first under the stated constraints.
Recommend what additional analysis you would run before changing the model or threshold.
Propose concrete next steps to improve performance.

Constraints

Senior escalation team can handle at most 1,300 tickets/day.
Missing a true escalation is estimated to cost 8x more than an unnecessary escalation.
Product leadership wants a single primary metric for model reporting.

Context

Current Performance

The model was evaluated on 10,000 labeled tickets, of which 1,200 truly required escalation.

Metric	Value
Accuracy	0.90
Precision	0.68
Recall	0.57
F1 Score	0.62
Escalation rate predicted by model	10.0%
Actual escalation rate	12.0%

Confusion matrix counts:

	Predicted Escalate	Predicted Do Not Escalate
Actual Escalate	684	516
Actual Do Not Escalate	316	8,484

The Problem

Your Task

Explain how you would choose between accuracy, precision, recall, and F1 for this use case.
Interpret the current metrics in business terms, not just mathematically.
Identify which metric should be optimized first under the stated constraints.
Recommend what additional analysis you would run before changing the model or threshold.
Propose concrete next steps to improve performance.

Constraints

Senior escalation team can handle at most 1,300 tickets/day.
Missing a true escalation is estimated to cost 8x more than an unnecessary escalation.
Product leadership wants a single primary metric for model reporting.

Interview Guides

Context

Current Performance

The Problem

Your Task

Constraints

Choose Metrics for Ticket Escalation

Context

Current Performance

The Problem

Your Task

Constraints

Your Answer

Choose Metrics for Ticket Escalation

Context

Current Performance

The Problem

Your Task

Constraints

Choose Metrics for Ticket Escalation

Context

Current Performance

The Problem

Your Task

Constraints

Your Answer