StreamBox built a logistic regression model to predict 30-day subscription churn so the CRM team can target retention offers. On a single 20% holdout split, the model looked strong, but leadership is concerned the result may be overly optimistic because performance varies by data split.
| Metric | Single Holdout | 5-Fold Cross-Validation Mean | Fold Std Dev |
|---|---|---|---|
| Accuracy | 0.91 | 0.84 | 0.05 |
| Precision | 0.72 | 0.61 | 0.08 |
| Recall | 0.68 | 0.49 | 0.10 |
| F1 Score | 0.70 | 0.54 | 0.09 |
| AUC-ROC | 0.88 | 0.79 | 0.06 |
| Positive class rate | 0.18 | 0.18 | 0.01 |
| Fold 1 F1 | - | 0.66 | - |
| Fold 2 F1 | - | 0.58 | - |
| Fold 3 F1 | - | 0.52 | - |
| Fold 4 F1 | - | 0.47 | - |
| Fold 5 F1 | - | 0.45 | - |
The team needs to explain why cross-validation matters here and decide whether the model is reliable enough to launch. The gap between holdout and cross-validation suggests the single split may not reflect true generalization performance.