Business Context
MediScan Health wants an assistive screening model for chest X-rays to help radiologists prioritize likely pneumonia cases. The system is not a standalone diagnostic tool, but it must achieve high sensitivity and produce stable batch predictions for hospital workflows.
Dataset
Use a chest X-ray image classification dataset collected from 3 hospitals.
| Feature Group | Count | Examples |
|---|
| Image inputs | 1 | grayscale chest X-ray, resized to 224x224 |
| Metadata | 3 | hospital_id, patient_age_bucket, view_position |
| Target | 1 | pneumonia_label |
- Size: 28,400 studies, 1 image per study
- Target: Binary classification — pneumonia (1) vs normal (0)
- Class balance: 31% positive, 69% negative
- Missing data: 4% missing metadata; images may vary in brightness, contrast, and acquisition device
Success Criteria
A good solution should achieve strong recall on pneumonia cases while keeping false positives manageable for radiologist review. Target performance is recall >= 0.90, precision >= 0.75, and AUC-ROC >= 0.92 on a held-out test set.
Constraints
- The model must be explainable enough to support clinical review (for example, saliency or Grad-CAM heatmaps)
- Inference should complete in <150 ms per image on a GPU-backed service
- Training budget is limited to a single mid-range GPU
- Patient-level leakage must be avoided across train, validation, and test splits
Deliverables
- Explain the core CNN architecture: convolution, activation, pooling, flattening/global pooling, and classification head
- Build a binary image classifier for pneumonia detection
- Describe preprocessing and augmentation choices for medical images
- Evaluate the model using clinically relevant metrics, not accuracy alone
- Discuss overfitting risks, class imbalance handling, and deployment considerations in a hospital setting