Business Context
NorthStar Health runs a network of 18 hospitals and wants an early-warning model to identify ICU patients at risk of developing sepsis within the next 6 hours. The model will support clinician review, so strong recall and clear feature-level explanations matter more than raw accuracy.
Dataset
You are given de-identified ICU encounter data collected from the hospital EHR over 24 months.
| Feature Group | Count | Examples |
|---|
| Demographics | 6 | age, sex, BMI, admission_source |
| Vitals | 14 | heart_rate, respiratory_rate, systolic_bp, spo2, temperature |
| Labs | 18 | lactate, WBC, creatinine, bilirubin, platelet_count |
| Treatments / events | 7 | vasopressor_flag, antibiotic_started, fluid_bolus_ml |
| Temporal aggregates | 11 | 1h delta HR, 3h mean MAP, 6h urine_output_trend |
- Size: 92K ICU stays, 56 features
- Target: Binary label indicating whether the patient develops sepsis in the next 6 hours
- Class balance: 11.4% positive, 88.6% negative
- Missing data: 20-35% missing in some lab features because tests are ordered irregularly; <3% missing in vitals
Success Criteria
A solution is considered good enough if it achieves AUC-ROC >= 0.86, recall >= 0.80 at precision >= 0.35, and provides a ranked explanation of the most important drivers of risk for clinical review.
Constraints
- Inference must complete in <100 ms per patient in batch scoring every 15 minutes
- The model must be explainable enough for clinician trust and auditability
- False negatives are costly, but excessive false positives create alert fatigue
- Retraining should be feasible on a monthly cadence using standard Python ML tooling
Deliverables
- Build a binary classification model for 6-hour sepsis prediction.
- Describe preprocessing for mixed feature types and clinically meaningful missingness.
- Explain feature engineering choices, especially temporal and trend-based features.
- Define a validation strategy that avoids patient-level leakage.
- Report evaluation metrics, threshold selection, and feature importance.
- Recommend how the model would be deployed and monitored in production.