Ancestry has trained a gradient boosted classifier to rank record hints shown on an Ancestry member's family tree. The model predicts whether a user will accept a hint within 7 days. After launch, offline training performance looked excellent, but validation and holdout results were materially worse, and product partners are concerned the model may be overfitting.
| Metric | Training | Validation | Holdout Test |
|---|---|---|---|
| Accuracy | 0.94 | 0.81 | 0.80 |
| Precision | 0.92 | 0.76 | 0.75 |
| Recall | 0.90 | 0.68 | 0.66 |
| F1 Score | 0.91 | 0.72 | 0.70 |
| AUC-ROC | 0.97 | 0.84 | 0.83 |
| Log Loss | 0.18 | 0.49 | 0.52 |
| Positive rate | 0.41 | 0.39 | 0.40 |
The current model uses 140 features, including tree size, record type, historical hint acceptance behavior, session activity, and record-source metadata. Training used 4.2M labeled hints from Jan-Jun 2025; validation used Jul 2025; holdout test used Aug 2025.
You need to determine whether this pattern indicates overfitting, underfitting, or a different evaluation issue, and recommend how Ancestry should validate and improve the model before wider rollout.