You own a binary churn prediction model used in Salesforce Marketing Cloud to trigger retention outreach for a subscription business. The model is a gradient-boosted tree classifier scored weekly, and customers above a 0.50 threshold are sent to a limited retention budget campaign. Churn is rare, so leadership is concerned that the model's headline accuracy looks strong while the campaign is not reducing lost customers as much as expected. You are asked to evaluate whether the model is actually useful on this imbalanced dataset and whether the current threshold is appropriate.
| Metric | Validation Set |
|---|---|
| Positive class rate (churn) | 4.0% |
| Accuracy | 96.1% |
| Precision | 0.41 |
| Recall | 0.28 |
| F1 Score | 0.33 |
| AUC-ROC | 0.87 |
| Customers flagged for outreach | 2,700 / 50,000 |
| True churners caught | 560 / 2,000 |
| Confusion matrix count | Value |
| ----------------------- | ------- |
| TP | 560 |
| FP | 2,140 |
| FN | 1,440 |
| TN | 45,860 |
How would you evaluate this model given the class imbalance, and what would you recommend to decide whether to keep the current threshold, change the operating point, or use a different evaluation approach?