Data Society is evaluating a binary classification model that predicts whether a learner will complete a course within 30 days. A junior team member trained a gradient boosted tree model and reported strong training performance, but stakeholders are concerned that the model may not generalize well in production.
| Metric | Training Set | Validation Set | Test Set |
|---|---|---|---|
| Accuracy | 0.96 | 0.81 | 0.80 |
| Precision | 0.95 | 0.78 | 0.77 |
| Recall | 0.94 | 0.69 | 0.68 |
| F1 Score | 0.95 | 0.73 | 0.72 |
| AUC-ROC | 0.98 | 0.84 | 0.83 |
| Log Loss | 0.11 | 0.46 | 0.49 |
| Model variant B (simpler logistic regression baseline) achieved: training accuracy 0.78, validation accuracy 0.77, test accuracy 0.77, validation F1 0.74, and test F1 0.74. |
You need to determine whether the gradient boosted model is overfitting or underfitting, explain how the metrics support that conclusion, and recommend what Data Society should do next.