MediScan Health operates urgent-care clinics that process roughly 12,000 patient visits per week. The clinical operations team wants a machine learning model that flags whether a patient is likely to have a target condition (for example, pneumonia) using structured intake, vitals, lab results, and symptom data to support physician triage—not replace clinical judgment.
You are given a historical supervised learning dataset built from prior visits with confirmed diagnoses.
| Feature Group | Count | Examples |
|---|---|---|
| Demographics | 5 | age, sex, BMI, smoking_status, pregnancy_flag |
| Vitals | 8 | temperature, heart_rate, respiratory_rate, systolic_bp, oxygen_saturation |
| Labs | 12 | WBC, CRP, hemoglobin, platelet_count, sodium, creatinine |
| Symptoms / history | 10 | cough, fever_duration_days, chest_pain, shortness_of_breath, asthma_history |
| Visit context | 5 | clinic_id, visit_hour, season, referral_source, prior_admissions_12m |
A good solution should achieve strong recall for true positive cases while keeping false alarms manageable for clinicians. Target performance is recall >= 0.85, precision >= 0.45, and ROC-AUC >= 0.88 on a held-out test set.