You own a gradient-boosted churn prediction model used in Adobe Creative Cloud retention workflows. The model scores active subscribers weekly, and users above a 0.70 threshold are sent to a save-offer campaign with limited budget, so only the highest-risk users are targeted. Offline validation before launch looked strong, but after two months in production the retention team reports that many churned users were never targeted while campaign efficiency appears roughly unchanged. You are asked to assess whether the model is still performing well in production and what the numbers imply.
| Metric | Pre-Launch Validation | Production (Last 4 Weeks) |
|---|---|---|
| Accuracy | 0.91 | 0.90 |
| Precision | 0.68 | 0.66 |
| Recall | 0.74 | 0.49 |
| F1 Score | 0.71 | 0.56 |
| AUC-ROC | 0.88 | 0.81 |
| Log Loss | 0.29 | 0.41 |
| Predicted positive rate | 11.8% | 8.1% |
| Actual churn rate | 10.5% | 14.2% |
| Avg. score for churned users | 0.63 | 0.51 |
How would you interpret this production performance, and what would you recommend doing next to evaluate and improve the model in production?