ShopLens is building a binary classification model to predict whether a customer will purchase within 7 days after viewing a product. A logistic regression baseline was trained on 120,000 sessions, but the team only evaluated it on a single random train/test split and is unsure whether the reported performance is stable enough to trust.
| Metric | Single Holdout Result |
|---|---|
| Accuracy | 0.84 |
| Precision | 0.61 |
| Recall | 0.38 |
| F1 Score | 0.47 |
| AUC-ROC | 0.79 |
| Positive Class Rate | 0.14 |
The product team wants a more reliable estimate of model performance before launch. Because the positive class is relatively rare and customer behavior varies by traffic source and week, a single split may be giving an overly optimistic or unstable view of performance. You need to explain how you would implement cross-validation for this model evaluation and how you would use the results to decide whether the model is ready.