MedNova Health plans to deploy a gradient boosting model that predicts whether an emergency department patient will develop sepsis within 6 hours. The model will trigger an early-intervention workflow, including additional labs and physician review. Because this is a high-stakes clinical setting, leadership wants to know whether current performance is strong enough for deployment.
Validation was performed on 48,000 recent ED visits from a hospital not used in training. Sepsis prevalence in this set is 6.0% (2,880 cases).
| Metric | Development CV | External Validation | Target |
|---|---|---|---|
| Precision | 0.41 | 0.32 | >= 0.30 |
| Recall | 0.86 | 0.74 | >= 0.85 |
| F1 Score | 0.56 | 0.45 | >= 0.50 |
| AUC-ROC | 0.91 | 0.84 | >= 0.88 |
| Calibration slope | 0.98 | 0.71 | 0.90-1.10 |
| False positive rate | 0.08 | 0.11 | <= 0.10 |
| Alert rate | 12.5% | 13.9% | <= 12.0% |
The model looked strong in development, but external validation shows lower recall and weaker calibration. Missing sepsis cases is dangerous, while too many false alerts can overwhelm clinicians and reduce trust.