ShopFlow uses a binary classifier to predict whether a customer support ticket should be escalated to the specialist operations team. The new gradient-boosted model replaced a logistic regression baseline in a shadow test, but leadership cares about operational improvement, not just offline lift. Specialist capacity is fixed, and missed escalations create SLA breaches and customer churn risk.
| Metric | Baseline Model | New Model | Change |
|---|---|---|---|
| Accuracy | 0.842 | 0.861 | +0.019 |
| Precision | 0.610 | 0.540 | -0.070 |
| Recall | 0.420 | 0.680 | +0.260 |
| F1 Score | 0.497 | 0.602 | +0.105 |
| AUC-ROC | 0.781 | 0.844 | +0.063 |
| Log Loss | 0.412 | 0.356 | -0.056 |
| Daily escalations predicted | 820 | 1,480 | +660 |
| Daily true escalation need | 1,200 | 1,200 | 0 |
| Avg SLA breaches/day | 696 | 384 | -312 |
| Specialist review capacity/day | 1,300 | 1,300 | 0 |
The new classifier finds more truly urgent tickets, but it also sends many more tickets to specialists and exceeds daily review capacity. You need to determine whether the model is actually improving operations or simply shifting work downstream.