Choose Metrics for Rare Disease Screening

Context

MediScan built a binary classification model to detect a rare cardiac condition from routine screening data. The condition appears in only 2% of patients, and the clinical team is concerned that the current dashboard highlights accuracy even though missed cases can delay treatment.

Current Performance

The model was evaluated on 10,000 patients with 200 actual positive cases.

Metric	Model A	Model B
Accuracy	97.0%	91.0%
Recall	25.0%	85.0%
Precision	62.5%	16.2%
F1 Score	35.7%	27.3%
False Negatives	150	30
False Positives	150	870

The Problem

Leadership initially prefers Model A because its accuracy is much higher. However, clinicians argue that Model B is safer because it identifies far more true cases. You need to determine when accuracy is misleading in an imbalanced setting and how recall should influence model selection.

Requirements

Explain why accuracy alone is not sufficient for this problem.
Compare Model A and Model B using the provided metrics and confusion-matrix implications.
Recommend which metric should be prioritized for this use case and why.
Discuss the tradeoff between higher recall and lower precision.
Suggest how threshold tuning or workflow design could improve the final operating point.

Constraints

Missing a true positive delays specialist follow-up and has high clinical cost.
Every false positive triggers a manual review costing about $40.
The review team can handle at most 1,200 flagged patients per week.

Context

Current Performance

The model was evaluated on 10,000 patients with 200 actual positive cases.

Metric	Model A	Model B
Accuracy	97.0%	91.0%
Recall	25.0%	85.0%
Precision	62.5%	16.2%
F1 Score	35.7%	27.3%
False Negatives	150	30
False Positives	150	870

The Problem

Requirements

Explain why accuracy alone is not sufficient for this problem.
Compare Model A and Model B using the provided metrics and confusion-matrix implications.
Recommend which metric should be prioritized for this use case and why.
Discuss the tradeoff between higher recall and lower precision.
Suggest how threshold tuning or workflow design could improve the final operating point.

Constraints

Missing a true positive delays specialist follow-up and has high clinical cost.
Every false positive triggers a manual review costing about $40.
The review team can handle at most 1,200 flagged patients per week.

Context

Current Performance

The model was evaluated on 10,000 patients with 200 actual positive cases.

Metric	Model A	Model B
Accuracy	97.0%	91.0%
Recall	25.0%	85.0%
Precision	62.5%	16.2%
F1 Score	35.7%	27.3%
False Negatives	150	30
False Positives	150	870

The Problem

Requirements

Explain why accuracy alone is not sufficient for this problem.
Compare Model A and Model B using the provided metrics and confusion-matrix implications.
Recommend which metric should be prioritized for this use case and why.
Discuss the tradeoff between higher recall and lower precision.
Suggest how threshold tuning or workflow design could improve the final operating point.

Constraints

Missing a true positive delays specialist follow-up and has high clinical cost.
Every false positive triggers a manual review costing about $40.
The review team can handle at most 1,200 flagged patients per week.

Context

Current Performance

The model was evaluated on 10,000 patients with 200 actual positive cases.

Metric	Model A	Model B
Accuracy	97.0%	91.0%
Recall	25.0%	85.0%
Precision	62.5%	16.2%
F1 Score	35.7%	27.3%
False Negatives	150	30
False Positives	150	870

The Problem

Requirements

Explain why accuracy alone is not sufficient for this problem.
Compare Model A and Model B using the provided metrics and confusion-matrix implications.
Recommend which metric should be prioritized for this use case and why.
Discuss the tradeoff between higher recall and lower precision.
Suggest how threshold tuning or workflow design could improve the final operating point.

Constraints

Missing a true positive delays specialist follow-up and has high clinical cost.
Every false positive triggers a manual review costing about $40.
The review team can handle at most 1,200 flagged patients per week.

Interview Guides

Context

Current Performance

The Problem

Requirements

Constraints

Choose Metrics for Rare Disease Screening

Context

Current Performance

The Problem

Requirements

Constraints

Your Answer

Choose Metrics for Rare Disease Screening

Context

Current Performance

The Problem

Requirements

Constraints

Choose Metrics for Rare Disease Screening

Context

Current Performance

The Problem

Requirements

Constraints

Your Answer