ShopEase built a binary classification model to predict whether a user will purchase within 7 days after viewing a product. The model is a gradient boosted tree classifier used to prioritize remarketing campaigns. It performed well during development, but marketing is concerned that results on newly launched traffic sources are weaker than expected.
| Metric | Cross-Validation (Train) | Holdout Test | New Production Data (Last 30 Days) |
|---|---|---|---|
| Accuracy | 0.91 | 0.89 | 0.82 |
| Precision | 0.78 | 0.75 | 0.61 |
| Recall | 0.72 | 0.69 | 0.48 |
| F1 Score | 0.75 | 0.72 | 0.54 |
| AUC-ROC | 0.90 | 0.87 | 0.76 |
| Positive Rate | 0.24 | 0.23 | 0.19 |
The model appears strong on validation and test data, but performance drops materially on recent production data. The team wants to know whether the model truly generalizes to unseen data, what the metric gaps imply, and how to validate this systematically before expanding campaign spend.