MediScan built a binary classifier to flag chest X-rays for possible pneumonia so radiologists can prioritize urgent cases. The model is now in production, but hospital leadership is concerned that the team is focusing on the wrong metric when deciding whether to adjust the decision threshold.
| Metric | Validation Set | Notes |
|---|---|---|
| Precision | 0.91 | 91% of flagged scans are true pneumonia cases |
| Recall | 0.68 | 68% of all pneumonia cases are detected |
| F1 Score | 0.78 | Harmonic mean of precision and recall |
| Accuracy | 0.95 | High due to class imbalance |
| AUC-ROC | 0.89 | Good ranking ability overall |
| Positive class prevalence | 8% | Pneumonia is relatively rare |
At the current threshold, the confusion matrix on 10,000 labeled scans is:
| Predicted Positive | Predicted Negative | |
|---|---|---|
| Actual Positive | 544 | 256 |
| Actual Negative | 54 | 9,146 |
The radiology operations lead wants fewer false alarms to reduce unnecessary urgent reviews, while the chief medical officer is more concerned about missed pneumonia cases. You need to explain the difference between precision and recall using the model's actual results and recommend how the team should reason about threshold changes.