Choose Metrics for Lead Scoring

Context

Pyramid Consulting uses a binary classification model in its candidate-job matching workflow to predict whether a submitted candidate will be accepted for client interview. The model is deployed in an internal Pyramid Consulting recruiter surface and assigns a score to each candidate profile so recruiters can prioritize outreach.

The team is debating which metric should drive model selection because different stakeholders care about different errors. Recruiters want fewer low-quality submissions, while account managers want to avoid missing candidates who would likely convert.

Current Performance

Metric	Model A @ 0.70 threshold	Model B @ 0.40 threshold	Baseline Rules Engine
Precision	0.91	0.63	0.58
Recall	0.42	0.81	0.54
F1 Score	0.57	0.71	0.56
ROC-AUC	0.79	0.86	0.68
Positive rate predicted	9%	28%	22%
Actual positive rate	24%	24%	24%

The Problem

Client interview slots are limited, and each false positive wastes recruiter time and may hurt client trust. However, each false negative means Pyramid Consulting may miss a qualified candidate and lose placement revenue. You need to determine which metric should be prioritized for this use case and whether one model is clearly better.

Requirements

Explain when Precision, Recall, F1-Score, and ROC-AUC should be prioritized in this setting.
Interpret the tradeoffs between Model A and Model B using the numbers above.
Recommend the primary metric Pyramid Consulting should optimize for and justify it.
Propose how threshold choice changes the recommendation.
Identify what additional validation or error analysis you would run before deployment.

Constraints

Recruiters can manually review only 1,200 candidates per week.
Missing a strong candidate has higher long-term revenue cost than reviewing one extra weak candidate.
Client-facing quality cannot degrade materially.

Context

Current Performance

Metric	Model A @ 0.70 threshold	Model B @ 0.40 threshold	Baseline Rules Engine
Precision	0.91	0.63	0.58
Recall	0.42	0.81	0.54
F1 Score	0.57	0.71	0.56
ROC-AUC	0.79	0.86	0.68
Positive rate predicted	9%	28%	22%
Actual positive rate	24%	24%	24%

The Problem

Requirements

Explain when Precision, Recall, F1-Score, and ROC-AUC should be prioritized in this setting.
Interpret the tradeoffs between Model A and Model B using the numbers above.
Recommend the primary metric Pyramid Consulting should optimize for and justify it.
Propose how threshold choice changes the recommendation.
Identify what additional validation or error analysis you would run before deployment.

Constraints

Recruiters can manually review only 1,200 candidates per week.
Missing a strong candidate has higher long-term revenue cost than reviewing one extra weak candidate.
Client-facing quality cannot degrade materially.

Context

Current Performance

Metric	Model A @ 0.70 threshold	Model B @ 0.40 threshold	Baseline Rules Engine
Precision	0.91	0.63	0.58
Recall	0.42	0.81	0.54
F1 Score	0.57	0.71	0.56
ROC-AUC	0.79	0.86	0.68
Positive rate predicted	9%	28%	22%
Actual positive rate	24%	24%	24%

The Problem

Requirements

Explain when Precision, Recall, F1-Score, and ROC-AUC should be prioritized in this setting.
Interpret the tradeoffs between Model A and Model B using the numbers above.
Recommend the primary metric Pyramid Consulting should optimize for and justify it.
Propose how threshold choice changes the recommendation.
Identify what additional validation or error analysis you would run before deployment.

Constraints

Recruiters can manually review only 1,200 candidates per week.
Missing a strong candidate has higher long-term revenue cost than reviewing one extra weak candidate.
Client-facing quality cannot degrade materially.

Context

Current Performance

Metric	Model A @ 0.70 threshold	Model B @ 0.40 threshold	Baseline Rules Engine
Precision	0.91	0.63	0.58
Recall	0.42	0.81	0.54
F1 Score	0.57	0.71	0.56
ROC-AUC	0.79	0.86	0.68
Positive rate predicted	9%	28%	22%
Actual positive rate	24%	24%	24%

The Problem

Requirements

Explain when Precision, Recall, F1-Score, and ROC-AUC should be prioritized in this setting.
Interpret the tradeoffs between Model A and Model B using the numbers above.
Recommend the primary metric Pyramid Consulting should optimize for and justify it.
Propose how threshold choice changes the recommendation.
Identify what additional validation or error analysis you would run before deployment.

Constraints

Recruiters can manually review only 1,200 candidates per week.
Missing a strong candidate has higher long-term revenue cost than reviewing one extra weak candidate.
Client-facing quality cannot degrade materially.

Interview Guides

Context

Current Performance

The Problem

Requirements

Constraints

Choose Metrics for Lead Scoring

Context

Current Performance

The Problem

Requirements

Constraints

Your Answer

Choose Metrics for Lead Scoring

Context

Current Performance

The Problem

Requirements

Constraints

Choose Metrics for Lead Scoring

Context

Current Performance

The Problem

Requirements

Constraints

Your Answer