InboxShield uses a binary classifier to detect spam emails for a workplace email product. The team is deciding whether to keep the current threshold after complaints that too many spam emails still reach user inboxes, while some legitimate emails are incorrectly filtered.
| Metric | Current Model |
|---|---|
| Accuracy | 0.962 |
| Precision | 0.78 |
| Recall | 0.62 |
| F1 Score | 0.69 |
| AUC-ROC | 0.91 |
| Spam prevalence | 8% |
| Threshold | 0.50 |
On a validation set of 50,000 emails:
| Predicted Spam | Predicted Not Spam | |
|---|---|---|
| Actual Spam | 2,480 | 1,520 |
| Actual Not Spam | 700 | 45,300 |
The product manager points to 96.2% accuracy and argues the model is already strong. The support team disagrees because users still report missed spam, especially phishing-style messages. You need to explain what the F1 score means here and whether it is a better metric than accuracy for this use case.