Dataford
Interview Guides
Upgrade
All questions/Model Evaluation/Evaluate Precision-Recall for Spam Filtering

Evaluate Precision-Recall for Spam Filtering

Easy
Model Evaluation
PrecisionRecallF1 Score

Problem

Context

InboxShield at MailFlow uses a binary classifier to detect spam emails and route them to the spam folder. After a recent threshold change, customer complaints about missed spam dropped, but complaints about legitimate emails being hidden increased.

Current Performance

MetricBefore Threshold ChangeCurrentChange
Precision0.920.78-0.14
Recall0.610.86+0.25
F1 Score0.730.82+0.09
Accuracy0.970.95-0.02
False Positive Rate0.4%1.8%+1.4 pts
Emails flagged as spam/day12,40021,800+9,400
Actual spam/day15,60015,6000

The Problem

Leadership wants a clear explanation of what precision and recall mean in this setting, why both matter, and whether the new threshold is actually better for the business. The team must decide if they should keep the current threshold, revert it, or tune it differently for a better tradeoff.

Requirements

  1. Define precision and recall using the numbers above.
  2. Explain why improving recall caused precision to fall.
  3. Interpret whether the higher F1 score means the current model is better overall.
  4. Use the confusion matrix implications to discuss business impact of false positives vs false negatives.
  5. Recommend a threshold strategy and what additional analysis you would run before deployment.

Constraints

  • False positives hide legitimate customer emails, increasing support tickets and churn risk.
  • False negatives let spam into inboxes, reducing trust in the product.
  • The product team can only support one global threshold in the next release cycle.

Problem

Context

InboxShield at MailFlow uses a binary classifier to detect spam emails and route them to the spam folder. After a recent threshold change, customer complaints about missed spam dropped, but complaints about legitimate emails being hidden increased.

Current Performance

MetricBefore Threshold ChangeCurrentChange
Precision0.920.78-0.14
Recall0.610.86+0.25
F1 Score0.730.82+0.09
Accuracy0.970.95-0.02
False Positive Rate0.4%1.8%+1.4 pts
Emails flagged as spam/day12,40021,800+9,400
Actual spam/day15,60015,6000

The Problem

Leadership wants a clear explanation of what precision and recall mean in this setting, why both matter, and whether the new threshold is actually better for the business. The team must decide if they should keep the current threshold, revert it, or tune it differently for a better tradeoff.

Requirements

  1. Define precision and recall using the numbers above.
  2. Explain why improving recall caused precision to fall.
  3. Interpret whether the higher F1 score means the current model is better overall.
  4. Use the confusion matrix implications to discuss business impact of false positives vs false negatives.
  5. Recommend a threshold strategy and what additional analysis you would run before deployment.

Constraints

  • False positives hide legitimate customer emails, increasing support tickets and churn risk.
  • False negatives let spam into inboxes, reducing trust in the product.
  • The product team can only support one global threshold in the next release cycle.
Your answer
Try one AI text evaluation on us
Get structured feedback, scored against a 4-axis rubric. Premium unlocks unlimited.
0 wordstarget ~200
Up next
University of ChicagoInterpret F1 in Spam DetectionEasyInterpret F1 for Spam DetectionEasyLockheed MartinEvaluate Email Spam Classifier MetricsEasy
Next question