Interview Guides

Evaluate Extreme Class Imbalance

Medium

Model Evaluation

Context

HealthShield uses a binary classifier to detect rare insurance fraud claims before payout. The target class is highly imbalanced: only 1% of 500,000 historical claims are fraudulent, and the current model is being judged internally using overall accuracy.

Current Performance

Metric	Current Model	Naive Always-Negative Baseline
Fraud prevalence	1.0%	1.0%
Accuracy	98.6%	99.0%
Precision	28.0%	0.0%
Recall	42.0%	0.0%
F1 Score	33.6%	0.0%
AUC-ROC	0.91	0.50
PR-AUC	0.31	0.01
Claims flagged for review	7,500	0

The Problem

Leadership sees 98.6% accuracy and assumes the model is production-ready, but the fraud team argues that accuracy is misleading because the positive class is only 1%. You need to evaluate whether the model is actually useful, explain the tradeoffs created by the imbalance, and recommend how to improve both evaluation and model performance.

Requirements

Explain why accuracy is a poor primary metric in this setting.
Interpret the current precision, recall, F1, ROC-AUC, and PR-AUC values.
Use the confusion matrix to quantify business impact from false positives and false negatives.
Recommend how you would validate the model and choose an operating threshold.
Propose concrete modeling or data strategies to improve performance on the minority class.

Constraints

Manual investigation capacity is limited to 8,000 claims per month.
Each false positive review costs $12 in analyst time.
Each missed fraud claim costs an average of $1,800.
The evaluation approach must be understandable to non-technical stakeholders.

Evaluate Extreme Class Imbalance

Medium

Model Evaluation

Context

Current Performance

Metric	Current Model	Naive Always-Negative Baseline
Fraud prevalence	1.0%	1.0%
Accuracy	98.6%	99.0%
Precision	28.0%	0.0%
Recall	42.0%	0.0%
F1 Score	33.6%	0.0%
AUC-ROC	0.91	0.50
PR-AUC	0.31	0.01
Claims flagged for review	7,500	0

The Problem

Requirements

Explain why accuracy is a poor primary metric in this setting.
Interpret the current precision, recall, F1, ROC-AUC, and PR-AUC values.
Use the confusion matrix to quantify business impact from false positives and false negatives.
Recommend how you would validate the model and choose an operating threshold.
Propose concrete modeling or data strategies to improve performance on the minority class.

Constraints

Manual investigation capacity is limited to 8,000 claims per month.
Each false positive review costs $12 in analyst time.
Each missed fraud claim costs an average of $1,800.
The evaluation approach must be understandable to non-technical stakeholders.

Your Answer

Evaluate Extreme Class Imbalance

Medium

Model Evaluation

Context

Current Performance

Metric	Current Model	Naive Always-Negative Baseline
Fraud prevalence	1.0%	1.0%
Accuracy	98.6%	99.0%
Precision	28.0%	0.0%
Recall	42.0%	0.0%
F1 Score	33.6%	0.0%
AUC-ROC	0.91	0.50
PR-AUC	0.31	0.01
Claims flagged for review	7,500	0

The Problem

Requirements

Explain why accuracy is a poor primary metric in this setting.
Interpret the current precision, recall, F1, ROC-AUC, and PR-AUC values.
Use the confusion matrix to quantify business impact from false positives and false negatives.
Recommend how you would validate the model and choose an operating threshold.
Propose concrete modeling or data strategies to improve performance on the minority class.

Constraints

Manual investigation capacity is limited to 8,000 claims per month.
Each false positive review costs $12 in analyst time.
Each missed fraud claim costs an average of $1,800.
The evaluation approach must be understandable to non-technical stakeholders.

Evaluate Extreme Class Imbalance

Medium

Model Evaluation

Context

Current Performance

Metric	Current Model	Naive Always-Negative Baseline
Fraud prevalence	1.0%	1.0%
Accuracy	98.6%	99.0%
Precision	28.0%	0.0%
Recall	42.0%	0.0%
F1 Score	33.6%	0.0%
AUC-ROC	0.91	0.50
PR-AUC	0.31	0.01
Claims flagged for review	7,500	0

The Problem

Requirements

Explain why accuracy is a poor primary metric in this setting.
Interpret the current precision, recall, F1, ROC-AUC, and PR-AUC values.
Use the confusion matrix to quantify business impact from false positives and false negatives.
Recommend how you would validate the model and choose an operating threshold.
Propose concrete modeling or data strategies to improve performance on the minority class.

Constraints

Manual investigation capacity is limited to 8,000 claims per month.
Each false positive review costs $12 in analyst time.
Each missed fraud claim costs an average of $1,800.
The evaluation approach must be understandable to non-technical stakeholders.