Compare Precision-Recall Tradeoffs

Context

ShopLens is evaluating two binary classification models that predict whether a customer support ticket should be escalated to a human specialist. Escalations are expensive, but missing urgent tickets leads to SLA breaches and customer churn. The team wants to choose between two models with different precision and recall profiles.

Current Performance

Evaluation was run on a holdout set of 10,000 tickets, with 1,000 truly urgent tickets and 9,000 non-urgent tickets.

Metric	Model A	Model B
Precision	0.91	0.68
Recall	0.54	0.86
F1 Score	0.68	0.76
Accuracy	0.94	0.90
False Positives	53	405
False Negatives	460	140
Predicted Positive Tickets	593	1,265

The Problem

Model A is much more precise but misses many urgent tickets. Model B catches most urgent tickets but sends many more non-urgent tickets to specialists. The hiring manager wants to know how you would compare these models, which one you would recommend, and whether threshold tuning could produce a better operating point.

Requirements

Compare the two models using the provided metrics and explain the tradeoff clearly.
Identify which model is better if the business prioritizes minimizing missed urgent tickets.
Identify which model is better if specialist review capacity is limited.
Explain whether F1 score alone is sufficient for this decision.
Recommend a threshold or evaluation approach to align model choice with business cost.

Constraints

Each false positive escalation costs about $8 in specialist time.
Each false negative costs about $120 in SLA penalties and churn risk.
The specialist team can review at most 1,000 escalated tickets per day.

Context

Current Performance

Evaluation was run on a holdout set of 10,000 tickets, with 1,000 truly urgent tickets and 9,000 non-urgent tickets.

Metric	Model A	Model B
Precision	0.91	0.68
Recall	0.54	0.86
F1 Score	0.68	0.76
Accuracy	0.94	0.90
False Positives	53	405
False Negatives	460	140
Predicted Positive Tickets	593	1,265

The Problem

Requirements

Compare the two models using the provided metrics and explain the tradeoff clearly.
Identify which model is better if the business prioritizes minimizing missed urgent tickets.
Identify which model is better if specialist review capacity is limited.
Explain whether F1 score alone is sufficient for this decision.
Recommend a threshold or evaluation approach to align model choice with business cost.

Constraints

Each false positive escalation costs about $8 in specialist time.
Each false negative costs about $120 in SLA penalties and churn risk.
The specialist team can review at most 1,000 escalated tickets per day.

Context

Current Performance

Evaluation was run on a holdout set of 10,000 tickets, with 1,000 truly urgent tickets and 9,000 non-urgent tickets.

Metric	Model A	Model B
Precision	0.91	0.68
Recall	0.54	0.86
F1 Score	0.68	0.76
Accuracy	0.94	0.90
False Positives	53	405
False Negatives	460	140
Predicted Positive Tickets	593	1,265

The Problem

Requirements

Compare the two models using the provided metrics and explain the tradeoff clearly.
Identify which model is better if the business prioritizes minimizing missed urgent tickets.
Identify which model is better if specialist review capacity is limited.
Explain whether F1 score alone is sufficient for this decision.
Recommend a threshold or evaluation approach to align model choice with business cost.

Constraints

Each false positive escalation costs about $8 in specialist time.
Each false negative costs about $120 in SLA penalties and churn risk.
The specialist team can review at most 1,000 escalated tickets per day.

Context

Current Performance

Evaluation was run on a holdout set of 10,000 tickets, with 1,000 truly urgent tickets and 9,000 non-urgent tickets.

Metric	Model A	Model B
Precision	0.91	0.68
Recall	0.54	0.86
F1 Score	0.68	0.76
Accuracy	0.94	0.90
False Positives	53	405
False Negatives	460	140
Predicted Positive Tickets	593	1,265

The Problem

Requirements

Compare the two models using the provided metrics and explain the tradeoff clearly.
Identify which model is better if the business prioritizes minimizing missed urgent tickets.
Identify which model is better if specialist review capacity is limited.
Explain whether F1 score alone is sufficient for this decision.
Recommend a threshold or evaluation approach to align model choice with business cost.

Constraints

Each false positive escalation costs about $8 in specialist time.
Each false negative costs about $120 in SLA penalties and churn risk.
The specialist team can review at most 1,000 escalated tickets per day.

Interview Guides

Context

Current Performance

The Problem

Requirements

Constraints

Compare Precision-Recall Tradeoffs

Context

Current Performance

The Problem

Requirements

Constraints

Your Answer

Compare Precision-Recall Tradeoffs

Context

Current Performance

The Problem

Requirements

Constraints

Compare Precision-Recall Tradeoffs

Context

Current Performance

The Problem

Requirements

Constraints

Your Answer