Compare Baseline and New Model

Context

ShopLens built a binary classification model to predict whether a customer will click a recommended product in the app. The team has been using a logistic regression baseline and is testing a new gradient boosting model before rollout. Leadership sees slightly better top-line metrics from the new model, but the product team is concerned about whether the improvement is meaningful and whether any tradeoffs are hidden.

Current Performance

Metric	Baseline Model	New Model	Change
Accuracy	0.842	0.861	+0.019
Precision	0.610	0.680	+0.070
Recall	0.550	0.470	-0.080
F1 Score	0.578	0.554	-0.024
AUC-ROC	0.781	0.824	+0.043
Log Loss	0.462	0.418	-0.044
Positive rate in data	0.180	0.180	0.000

The Problem

The new model appears better on accuracy and AUC-ROC, but it detects fewer actual positive cases. You need to explain the difference between the baseline and new model in a practical evaluation setting and determine whether the new model should replace the baseline.

Requirements

Explain what the baseline model represents and why it is used in evaluation.
Compare the new model against the baseline using the metrics above.
Identify which metrics improved and which worsened, and what that means operationally.
Recommend whether to ship the new model as-is, adjust its threshold, or keep the baseline.
Describe what additional validation or error analysis you would run before making a final decision.

Constraints

Recommendation slots are limited, so false positives create poor user experience.
Missing a true click opportunity reduces engagement and revenue.
The team can only support one production model this quarter.

Context

Current Performance

Metric	Baseline Model	New Model	Change
Accuracy	0.842	0.861	+0.019
Precision	0.610	0.680	+0.070
Recall	0.550	0.470	-0.080
F1 Score	0.578	0.554	-0.024
AUC-ROC	0.781	0.824	+0.043
Log Loss	0.462	0.418	-0.044
Positive rate in data	0.180	0.180	0.000

The Problem

Requirements

Explain what the baseline model represents and why it is used in evaluation.
Compare the new model against the baseline using the metrics above.
Identify which metrics improved and which worsened, and what that means operationally.
Recommend whether to ship the new model as-is, adjust its threshold, or keep the baseline.
Describe what additional validation or error analysis you would run before making a final decision.

Constraints

Recommendation slots are limited, so false positives create poor user experience.
Missing a true click opportunity reduces engagement and revenue.
The team can only support one production model this quarter.

Context

Current Performance

Metric	Baseline Model	New Model	Change
Accuracy	0.842	0.861	+0.019
Precision	0.610	0.680	+0.070
Recall	0.550	0.470	-0.080
F1 Score	0.578	0.554	-0.024
AUC-ROC	0.781	0.824	+0.043
Log Loss	0.462	0.418	-0.044
Positive rate in data	0.180	0.180	0.000

The Problem

Requirements

Explain what the baseline model represents and why it is used in evaluation.
Compare the new model against the baseline using the metrics above.
Identify which metrics improved and which worsened, and what that means operationally.
Recommend whether to ship the new model as-is, adjust its threshold, or keep the baseline.
Describe what additional validation or error analysis you would run before making a final decision.

Constraints

Recommendation slots are limited, so false positives create poor user experience.
Missing a true click opportunity reduces engagement and revenue.
The team can only support one production model this quarter.

Context

Current Performance

Metric	Baseline Model	New Model	Change
Accuracy	0.842	0.861	+0.019
Precision	0.610	0.680	+0.070
Recall	0.550	0.470	-0.080
F1 Score	0.578	0.554	-0.024
AUC-ROC	0.781	0.824	+0.043
Log Loss	0.462	0.418	-0.044
Positive rate in data	0.180	0.180	0.000

The Problem

Requirements

Explain what the baseline model represents and why it is used in evaluation.
Compare the new model against the baseline using the metrics above.
Identify which metrics improved and which worsened, and what that means operationally.
Recommend whether to ship the new model as-is, adjust its threshold, or keep the baseline.
Describe what additional validation or error analysis you would run before making a final decision.

Constraints

Recommendation slots are limited, so false positives create poor user experience.
Missing a true click opportunity reduces engagement and revenue.
The team can only support one production model this quarter.

Interview Guides

Context

Current Performance

The Problem

Requirements

Constraints

Compare Baseline and New Model

Context

Current Performance

The Problem

Requirements

Constraints

Your Answer

Compare Baseline and New Model

Context

Current Performance

The Problem

Requirements

Constraints

Compare Baseline and New Model

Context

Current Performance

The Problem

Requirements

Constraints

Your Answer