StreamWave uses a gradient boosting classifier to predict which subscribers are likely to cancel within 30 days so the retention team can send discounts. The model performed well during training, but leadership is concerned that offline results may not generalize because recent campaign performance has been weaker than expected.
| Metric | Training Set | Validation Set | Gap |
|---|---|---|---|
| Accuracy | 0.94 | 0.81 | -0.13 |
| Precision | 0.91 | 0.68 | -0.23 |
| Recall | 0.89 | 0.61 | -0.28 |
| F1 Score | 0.90 | 0.64 | -0.26 |
| AUC-ROC | 0.97 | 0.76 | -0.21 |
| Log Loss | 0.18 | 0.49 | +0.31 |
| Positive rate (churn) | 0.22 | 0.21 | -0.01 |
The data science manager wants to know whether this model is overfitting, how confident you are in that diagnosis, and what should be changed before the next deployment. You should focus on interpreting the gap between training and validation performance rather than proposing a brand-new modeling approach.