Evaluate Coaching Match Quality Model

Context

BetterUp uses a binary classification model to predict whether a newly proposed coach-member match will lead to a successful first 30 days, defined as the member attending the first session and giving a post-session rating of 4 or 5. The model is used in the matching workflow inside BetterUp Care to prioritize recommended coach options.

A new version of the model was deployed last month. Leadership sees slightly higher overall accuracy, but member complaints about poor-fit recommendations have increased, especially for enterprise members in their first week on the platform.

Current Performance

Metric	Previous Model	Current Model	Change
Accuracy	0.74	0.78	+0.04
Precision	0.69	0.81	+0.12
Recall	0.76	0.52	-0.24
F1 Score	0.72	0.63	-0.09
AUC-ROC	0.80	0.79	-0.01
Positive prediction rate	0.41	0.24	-0.17
Successful matches in eval set	3,600	3,600	0

The Problem

The current model is more conservative: it recommends fewer matches as likely successful, and those recommendations are more often correct, but it misses many matches that would have succeeded. You need to assess whether the new model is actually better for BetterUp's matching experience.

Requirements

Interpret what the metric changes imply about model behavior.
Explain why higher accuracy does not necessarily mean a better model here.
Use the confusion-matrix implications to discuss business impact.
Recommend which metrics should be primary for this use case and why.
Propose specific next steps to improve evaluation and model performance.

Constraints

BetterUp wants to avoid showing too few viable coach options to new members.
False positives create some member friction, but false negatives reduce match coverage and can delay time-to-first-session.
The matching team can only retrain and redeploy once every two weeks.

Context

Current Performance

Metric	Previous Model	Current Model	Change
Accuracy	0.74	0.78	+0.04
Precision	0.69	0.81	+0.12
Recall	0.76	0.52	-0.24
F1 Score	0.72	0.63	-0.09
AUC-ROC	0.80	0.79	-0.01
Positive prediction rate	0.41	0.24	-0.17
Successful matches in eval set	3,600	3,600	0

The Problem

Requirements

Interpret what the metric changes imply about model behavior.
Explain why higher accuracy does not necessarily mean a better model here.
Use the confusion-matrix implications to discuss business impact.
Recommend which metrics should be primary for this use case and why.
Propose specific next steps to improve evaluation and model performance.

Constraints

BetterUp wants to avoid showing too few viable coach options to new members.
False positives create some member friction, but false negatives reduce match coverage and can delay time-to-first-session.
The matching team can only retrain and redeploy once every two weeks.

Context

Current Performance

Metric	Previous Model	Current Model	Change
Accuracy	0.74	0.78	+0.04
Precision	0.69	0.81	+0.12
Recall	0.76	0.52	-0.24
F1 Score	0.72	0.63	-0.09
AUC-ROC	0.80	0.79	-0.01
Positive prediction rate	0.41	0.24	-0.17
Successful matches in eval set	3,600	3,600	0

The Problem

Requirements

Interpret what the metric changes imply about model behavior.
Explain why higher accuracy does not necessarily mean a better model here.
Use the confusion-matrix implications to discuss business impact.
Recommend which metrics should be primary for this use case and why.
Propose specific next steps to improve evaluation and model performance.

Constraints

BetterUp wants to avoid showing too few viable coach options to new members.
False positives create some member friction, but false negatives reduce match coverage and can delay time-to-first-session.
The matching team can only retrain and redeploy once every two weeks.

Context

Current Performance

Metric	Previous Model	Current Model	Change
Accuracy	0.74	0.78	+0.04
Precision	0.69	0.81	+0.12
Recall	0.76	0.52	-0.24
F1 Score	0.72	0.63	-0.09
AUC-ROC	0.80	0.79	-0.01
Positive prediction rate	0.41	0.24	-0.17
Successful matches in eval set	3,600	3,600	0

The Problem

Requirements

Interpret what the metric changes imply about model behavior.
Explain why higher accuracy does not necessarily mean a better model here.
Use the confusion-matrix implications to discuss business impact.
Recommend which metrics should be primary for this use case and why.
Propose specific next steps to improve evaluation and model performance.

Constraints

BetterUp wants to avoid showing too few viable coach options to new members.
False positives create some member friction, but false negatives reduce match coverage and can delay time-to-first-session.
The matching team can only retrain and redeploy once every two weeks.

Interview Guides

Context

Current Performance

The Problem

Requirements

Constraints

Evaluate Coaching Match Quality Model

Context

Current Performance

The Problem

Requirements

Constraints

Your Answer

Evaluate Coaching Match Quality Model

Context

Current Performance

The Problem

Requirements

Constraints

Evaluate Coaching Match Quality Model

Context

Current Performance

The Problem

Requirements

Constraints

Your Answer