InboxShield at MailFlow uses a binary classifier to detect spam emails and route them to the spam folder. After a recent threshold change, customer complaints about missed spam dropped, but complaints about legitimate emails being hidden increased.
| Metric | Before Threshold Change | Current | Change |
|---|---|---|---|
| Precision | 0.92 | 0.78 | -0.14 |
| Recall | 0.61 | 0.86 | +0.25 |
| F1 Score | 0.73 | 0.82 | +0.09 |
| Accuracy | 0.97 | 0.95 | -0.02 |
| False Positive Rate | 0.4% | 1.8% | +1.4 pts |
| Emails flagged as spam/day | 12,400 | 21,800 | +9,400 |
| Actual spam/day | 15,600 | 15,600 | 0 |
Leadership wants a clear explanation of what precision and recall mean in this setting, why both matter, and whether the new threshold is actually better for the business. The team must decide if they should keep the current threshold, revert it, or tune it differently for a better tradeoff.