Monitor Production Model Drift

Context

ShopLens uses a gradient-boosted binary classifier to predict whether a user session will convert within 24 hours, so the marketing platform can trigger high-value discount offers. The model performed well at launch, but over the last 8 weeks the growth team reports lower campaign ROI despite similar traffic volume.

Current Performance

Metric	Validation at Launch	Last 30 Days in Production	Change
AUC-ROC	0.84	0.76	-0.08
Precision @ threshold 0.60	0.41	0.33	-0.08
Recall @ threshold 0.60	0.58	0.47	-0.11
F1 Score	0.48	0.39	-0.09
Log Loss	0.46	0.61	+0.15
Calibration error	0.03	0.11	+0.08
Avg predicted conversion rate	12.4%	14.8%	+2.4 pts
Actual conversion rate	11.9%	9.1%	-2.8 pts
PSI on top 12 features	0.08	0.27	+0.19

The Problem

You need to design a production drift monitoring approach that would have detected this degradation early, separated data drift from concept drift, and triggered the right investigation or retraining workflow.

Requirements

Explain what the metric changes suggest about drift type and severity.
Define the production monitoring dashboard and alert thresholds you would implement.
Specify which inputs, outputs, and delayed-label metrics should be tracked daily vs weekly.
Recommend how to diagnose whether the issue is feature drift, target drift, calibration drift, or threshold mismatch.
Propose a response plan for mild drift vs severe drift.

Constraints

Labels arrive with a 24-hour delay.
Discount offers have a fixed weekly budget of $180,000.
False positives waste discounts; false negatives miss revenue.
Full retraining can only be done every 2 weeks due to compliance review.

Context

Current Performance

Metric	Validation at Launch	Last 30 Days in Production	Change
AUC-ROC	0.84	0.76	-0.08
Precision @ threshold 0.60	0.41	0.33	-0.08
Recall @ threshold 0.60	0.58	0.47	-0.11
F1 Score	0.48	0.39	-0.09
Log Loss	0.46	0.61	+0.15
Calibration error	0.03	0.11	+0.08
Avg predicted conversion rate	12.4%	14.8%	+2.4 pts
Actual conversion rate	11.9%	9.1%	-2.8 pts
PSI on top 12 features	0.08	0.27	+0.19

The Problem

Requirements

Explain what the metric changes suggest about drift type and severity.
Define the production monitoring dashboard and alert thresholds you would implement.
Specify which inputs, outputs, and delayed-label metrics should be tracked daily vs weekly.
Recommend how to diagnose whether the issue is feature drift, target drift, calibration drift, or threshold mismatch.
Propose a response plan for mild drift vs severe drift.

Constraints

Labels arrive with a 24-hour delay.
Discount offers have a fixed weekly budget of $180,000.
False positives waste discounts; false negatives miss revenue.
Full retraining can only be done every 2 weeks due to compliance review.

Context

Current Performance

Metric	Validation at Launch	Last 30 Days in Production	Change
AUC-ROC	0.84	0.76	-0.08
Precision @ threshold 0.60	0.41	0.33	-0.08
Recall @ threshold 0.60	0.58	0.47	-0.11
F1 Score	0.48	0.39	-0.09
Log Loss	0.46	0.61	+0.15
Calibration error	0.03	0.11	+0.08
Avg predicted conversion rate	12.4%	14.8%	+2.4 pts
Actual conversion rate	11.9%	9.1%	-2.8 pts
PSI on top 12 features	0.08	0.27	+0.19

The Problem

Requirements

Explain what the metric changes suggest about drift type and severity.
Define the production monitoring dashboard and alert thresholds you would implement.
Specify which inputs, outputs, and delayed-label metrics should be tracked daily vs weekly.
Recommend how to diagnose whether the issue is feature drift, target drift, calibration drift, or threshold mismatch.
Propose a response plan for mild drift vs severe drift.

Constraints

Labels arrive with a 24-hour delay.
Discount offers have a fixed weekly budget of $180,000.
False positives waste discounts; false negatives miss revenue.
Full retraining can only be done every 2 weeks due to compliance review.

Context

Current Performance

Metric	Validation at Launch	Last 30 Days in Production	Change
AUC-ROC	0.84	0.76	-0.08
Precision @ threshold 0.60	0.41	0.33	-0.08
Recall @ threshold 0.60	0.58	0.47	-0.11
F1 Score	0.48	0.39	-0.09
Log Loss	0.46	0.61	+0.15
Calibration error	0.03	0.11	+0.08
Avg predicted conversion rate	12.4%	14.8%	+2.4 pts
Actual conversion rate	11.9%	9.1%	-2.8 pts
PSI on top 12 features	0.08	0.27	+0.19

The Problem

Requirements

Explain what the metric changes suggest about drift type and severity.
Define the production monitoring dashboard and alert thresholds you would implement.
Specify which inputs, outputs, and delayed-label metrics should be tracked daily vs weekly.
Recommend how to diagnose whether the issue is feature drift, target drift, calibration drift, or threshold mismatch.
Propose a response plan for mild drift vs severe drift.

Constraints

Labels arrive with a 24-hour delay.
Discount offers have a fixed weekly budget of $180,000.
False positives waste discounts; false negatives miss revenue.
Full retraining can only be done every 2 weeks due to compliance review.

Interview Guides

Context

Current Performance

The Problem

Requirements

Constraints

Monitor Production Model Drift

Context

Current Performance

The Problem

Requirements

Constraints

Your Answer

Monitor Production Model Drift

Context

Current Performance

The Problem

Requirements

Constraints

Monitor Production Model Drift

Context

Current Performance

The Problem

Requirements

Constraints

Your Answer