Assess Readiness for Sepsis Model

Context

MedNova Health plans to deploy a gradient boosting model that predicts whether an emergency department patient will develop sepsis within 6 hours. The model will trigger an early-intervention workflow, including additional labs and physician review. Because this is a high-stakes clinical setting, leadership wants to know whether current performance is strong enough for deployment.

Current Performance

Validation was performed on 48,000 recent ED visits from a hospital not used in training. Sepsis prevalence in this set is 6.0% (2,880 cases).

Metric	Development CV	External Validation	Target
Precision	0.41	0.32	>= 0.30
Recall	0.86	0.74	>= 0.85
F1 Score	0.56	0.45	>= 0.50
AUC-ROC	0.91	0.84	>= 0.88
Calibration slope	0.98	0.71	0.90-1.10
False positive rate	0.08	0.11	<= 0.10
Alert rate	12.5%	13.9%	<= 12.0%

The Problem

The model looked strong in development, but external validation shows lower recall and weaker calibration. Missing sepsis cases is dangerous, while too many false alerts can overwhelm clinicians and reduce trust.

Requirements

Determine whether the model is ready for deployment in its current form.
Interpret the tradeoff between recall, precision, calibration, and alert volume.
Identify the biggest risks of deploying now in a high-stakes environment.
Recommend what additional validation or threshold changes are needed before launch.
Propose a safe rollout plan if deployment proceeds.

Constraints

ED clinicians can handle at most 550 model alerts per day.
A missed sepsis case has much higher cost than an unnecessary alert.
The hospital requires evidence of stable performance across age groups and sites before full deployment.

Context

Current Performance

Validation was performed on 48,000 recent ED visits from a hospital not used in training. Sepsis prevalence in this set is 6.0% (2,880 cases).

Metric	Development CV	External Validation	Target
Precision	0.41	0.32	>= 0.30
Recall	0.86	0.74	>= 0.85
F1 Score	0.56	0.45	>= 0.50
AUC-ROC	0.91	0.84	>= 0.88
Calibration slope	0.98	0.71	0.90-1.10
False positive rate	0.08	0.11	<= 0.10
Alert rate	12.5%	13.9%	<= 12.0%

The Problem

Requirements

Determine whether the model is ready for deployment in its current form.
Interpret the tradeoff between recall, precision, calibration, and alert volume.
Identify the biggest risks of deploying now in a high-stakes environment.
Recommend what additional validation or threshold changes are needed before launch.
Propose a safe rollout plan if deployment proceeds.

Constraints

ED clinicians can handle at most 550 model alerts per day.
A missed sepsis case has much higher cost than an unnecessary alert.
The hospital requires evidence of stable performance across age groups and sites before full deployment.

Context

Current Performance

Validation was performed on 48,000 recent ED visits from a hospital not used in training. Sepsis prevalence in this set is 6.0% (2,880 cases).

Metric	Development CV	External Validation	Target
Precision	0.41	0.32	>= 0.30
Recall	0.86	0.74	>= 0.85
F1 Score	0.56	0.45	>= 0.50
AUC-ROC	0.91	0.84	>= 0.88
Calibration slope	0.98	0.71	0.90-1.10
False positive rate	0.08	0.11	<= 0.10
Alert rate	12.5%	13.9%	<= 12.0%

The Problem

Requirements

Determine whether the model is ready for deployment in its current form.
Interpret the tradeoff between recall, precision, calibration, and alert volume.
Identify the biggest risks of deploying now in a high-stakes environment.
Recommend what additional validation or threshold changes are needed before launch.
Propose a safe rollout plan if deployment proceeds.

Constraints

ED clinicians can handle at most 550 model alerts per day.
A missed sepsis case has much higher cost than an unnecessary alert.
The hospital requires evidence of stable performance across age groups and sites before full deployment.

Context

Current Performance

Validation was performed on 48,000 recent ED visits from a hospital not used in training. Sepsis prevalence in this set is 6.0% (2,880 cases).

Metric	Development CV	External Validation	Target
Precision	0.41	0.32	>= 0.30
Recall	0.86	0.74	>= 0.85
F1 Score	0.56	0.45	>= 0.50
AUC-ROC	0.91	0.84	>= 0.88
Calibration slope	0.98	0.71	0.90-1.10
False positive rate	0.08	0.11	<= 0.10
Alert rate	12.5%	13.9%	<= 12.0%

The Problem

Requirements

Determine whether the model is ready for deployment in its current form.
Interpret the tradeoff between recall, precision, calibration, and alert volume.
Identify the biggest risks of deploying now in a high-stakes environment.
Recommend what additional validation or threshold changes are needed before launch.
Propose a safe rollout plan if deployment proceeds.

Constraints

ED clinicians can handle at most 550 model alerts per day.
A missed sepsis case has much higher cost than an unnecessary alert.
The hospital requires evidence of stable performance across age groups and sites before full deployment.

Interview Guides

Context

Current Performance

The Problem

Requirements

Constraints

Assess Readiness for Sepsis Model

Context

Current Performance

The Problem

Requirements

Constraints

Your Answer

Assess Readiness for Sepsis Model

Context

Current Performance

The Problem

Requirements

Constraints

Assess Readiness for Sepsis Model

Context

Current Performance

The Problem

Requirements

Constraints

Your Answer