Why Accuracy Fails in Screening

Context

MediScan built a binary classification model to detect a rare cancer from routine blood test results. The model is being considered for use as an initial screening tool, where missed positive cases are far more costly than sending some healthy patients to follow-up testing.

Current Performance

The test set contains 100,000 patients, with a disease prevalence of 1%.

Metric	Model A	Naive Baseline
Accuracy	98.7%	99.0%
Precision	43.3%	0.0%
Recall	52.0%	0.0%
F1 Score	47.2%	0.0%
False Negatives	480	1,000
False Positives	680	0

The Problem

A product manager argues that the model should not be deployed because its 98.7% accuracy is lower than the naive baseline's 99.0%. However, the clinical team believes accuracy is the wrong metric because the dataset is highly imbalanced and the business cost of false negatives is much higher than false positives.

Requirements

Explain why accuracy is misleading in this scenario.
Use the metrics and confusion matrix counts to compare Model A with the naive baseline.
Identify which metrics are more appropriate for this use case and why.
Recommend whether MediScan should optimize for precision, recall, F1, or another metric.
Suggest how threshold tuning could change the tradeoff.

Constraints

Each missed cancer case can delay treatment significantly.
Each false positive leads to a follow-up diagnostic costing $250.
The hospital can handle at most 2,000 follow-up tests per 100,000 patients.

Context

Current Performance

The test set contains 100,000 patients, with a disease prevalence of 1%.

Metric	Model A	Naive Baseline
Accuracy	98.7%	99.0%
Precision	43.3%	0.0%
Recall	52.0%	0.0%
F1 Score	47.2%	0.0%
False Negatives	480	1,000
False Positives	680	0

The Problem

Requirements

Explain why accuracy is misleading in this scenario.
Use the metrics and confusion matrix counts to compare Model A with the naive baseline.
Identify which metrics are more appropriate for this use case and why.
Recommend whether MediScan should optimize for precision, recall, F1, or another metric.
Suggest how threshold tuning could change the tradeoff.

Constraints

Each missed cancer case can delay treatment significantly.
Each false positive leads to a follow-up diagnostic costing $250.
The hospital can handle at most 2,000 follow-up tests per 100,000 patients.

Context

Current Performance

The test set contains 100,000 patients, with a disease prevalence of 1%.

Metric	Model A	Naive Baseline
Accuracy	98.7%	99.0%
Precision	43.3%	0.0%
Recall	52.0%	0.0%
F1 Score	47.2%	0.0%
False Negatives	480	1,000
False Positives	680	0

The Problem

Requirements

Explain why accuracy is misleading in this scenario.
Use the metrics and confusion matrix counts to compare Model A with the naive baseline.
Identify which metrics are more appropriate for this use case and why.
Recommend whether MediScan should optimize for precision, recall, F1, or another metric.
Suggest how threshold tuning could change the tradeoff.

Constraints

Each missed cancer case can delay treatment significantly.
Each false positive leads to a follow-up diagnostic costing $250.
The hospital can handle at most 2,000 follow-up tests per 100,000 patients.

Context

Current Performance

The test set contains 100,000 patients, with a disease prevalence of 1%.

Metric	Model A	Naive Baseline
Accuracy	98.7%	99.0%
Precision	43.3%	0.0%
Recall	52.0%	0.0%
F1 Score	47.2%	0.0%
False Negatives	480	1,000
False Positives	680	0

The Problem

Requirements

Explain why accuracy is misleading in this scenario.
Use the metrics and confusion matrix counts to compare Model A with the naive baseline.
Identify which metrics are more appropriate for this use case and why.
Recommend whether MediScan should optimize for precision, recall, F1, or another metric.
Suggest how threshold tuning could change the tradeoff.

Constraints

Each missed cancer case can delay treatment significantly.
Each false positive leads to a follow-up diagnostic costing $250.
The hospital can handle at most 2,000 follow-up tests per 100,000 patients.

Interview Guides

Context

Current Performance

The Problem

Requirements

Constraints

Why Accuracy Fails in Screening

Context

Current Performance

The Problem

Requirements

Constraints

Your Answer

Why Accuracy Fails in Screening

Context

Current Performance

The Problem

Requirements

Constraints

Why Accuracy Fails in Screening

Context

Current Performance

The Problem

Requirements

Constraints

Your Answer