NorthRiver Health deployed a gradient boosting model to predict inpatient sepsis risk within the next 12 hours and trigger rapid-response review. After a 6-week pilot across 3 hospitals, leadership sees mixed results: the model improved early detection, but clinicians report too many alerts and uneven performance across units.
| Metric | Validation Set | Pilot Production | Target |
|---|---|---|---|
| AUC-ROC | 0.89 | 0.84 | >= 0.85 |
| Precision | 0.41 | 0.28 | >= 0.35 |
| Recall | 0.76 | 0.71 | >= 0.75 |
| F1 Score | 0.53 | 0.40 | >= 0.50 |
| Alert rate | 9.8% | 14.2% | <= 10% |
| Calibration slope | 0.97 | 0.78 | 0.95-1.05 |
| Median lead time before diagnosis | 5.6 hrs | 4.1 hrs | >= 4 hrs |
| ICU unit recall | 0.81 | 0.79 | >= 0.75 |
| General ward recall | 0.74 | 0.63 | >= 0.72 |
During the pilot, 18,400 admissions were scored. Sepsis prevalence was 6.5% (1,196 cases). At the current threshold, the model generated 2,613 alerts, including 732 true positives and 1,881 false positives.
You need to assess whether this AI project should be considered successful in a healthcare setting, where clinical benefit, alert burden, calibration, and patient safety all matter—not just discrimination metrics.