MediScan built a binary classifier to flag chest X-rays for possible pneumonia so radiologists can prioritize urgent cases. After deployment, the hospital noticed that the model catches many true pneumonia cases, but it also sends a large number of healthy scans for review.
| Metric | Validation Set | Notes |
|---|---|---|
| Precision | 0.62 | 62% of flagged scans are truly positive |
| Recall | 0.91 | 91% of actual pneumonia cases are detected |
| F1 Score | 0.74 | Harmonic mean of precision and recall |
| Accuracy | 0.89 | Overall accuracy on an imbalanced dataset |
| AUC-ROC | 0.93 | Strong ranking ability overall |
| Positive class prevalence | 0.12 | 12% of scans are true pneumonia |
| Daily flagged scans | 290 | Review queue created by the model |
Clinical leadership wants to understand whether this model is appropriately tuned. Missing a pneumonia case is costly, but too many false alarms increase radiologist workload and delay other urgent reads. The team needs a clear explanation of precision and recall, what these values imply here, and whether the threshold should change.