Validate Suspected Classification Overfitting

Scenario

You own a gradient-boosted binary classifier that prioritizes satellite imagery tiles for manual review in a defense monitoring workflow. Tiles scoring above a 0.60 threshold are escalated to analysts, and the model was approved after strong offline results on a held-out test set. Two weeks after deployment, stakeholders ask whether the model is overfitting because analyst feedback suggests the alerts are less reliable than expected even though offline metrics looked excellent. You need to assess whether the gap is due to overfitting versus threshold choice or data shift.

Performance Data

Metric	Training	Validation	Holdout Test	Production Week 2
Accuracy	0.98	0.89	0.88	0.84
Precision	0.97	0.81	0.79	0.71
Recall	0.96	0.76	0.74	0.69
F1 Score	0.97	0.78	0.76	0.70
AUC-ROC	0.99	0.86	0.85	0.80
Positive prediction rate	18%	14%	13%	11%

Question

How would you determine whether this model is truly overfitting, and what validation approach or model changes would you recommend before expanding deployment?

Scenario

Metric

Training

Validation

Holdout Test

Production Week 2

Accuracy

0.98

0.89

0.88

0.84

Precision

0.97

0.81

0.79

0.71

Recall

0.96

0.76

0.74

0.69

F1 Score

0.97

0.78

0.76

0.70

AUC-ROC

0.99

0.86

0.85

0.80

Positive prediction rate

18%

14%

13%

11%

Scenario

Metric

Training

Validation

Holdout Test

Production Week 2

Accuracy

0.98

0.89

0.88

0.84

Precision

0.97

0.81

0.79

0.71

Recall

0.96

0.76

0.74

0.69

F1 Score

0.97

0.78

0.76

0.70

AUC-ROC

0.99

0.86

0.85

0.80

Positive prediction rate

18%

14%

13%

11%

Scenario

Metric

Training

Validation

Holdout Test

Production Week 2

Accuracy

0.98

0.89

0.88

0.84

Precision

0.97

0.81

0.79

0.71

Recall

0.96

0.76

0.74

0.69

F1 Score

0.97

0.78

0.76

0.70

AUC-ROC

0.99

0.86

0.85

0.80

Positive prediction rate

18%

14%

13%

11%

Interview Guides

Scenario

Performance Data

Question

Validate Suspected Classification Overfitting

Scenario

Performance Data

Question

Your Answer

Validate Suspected Classification Overfitting

Scenario

Performance Data

Question

Validate Suspected Classification Overfitting

Scenario

Performance Data

Question

Your Answer