Business Context
OncoHealth, a leading oncology research organization, is analyzing clinical trial data to understand survival outcomes for patients undergoing treatment for lung cancer. The organization aims to identify significant predictors of survival and to visualize survival probabilities over time, aiding in treatment decisions and patient counseling.
Dataset
| Feature Group | Count | Examples |
|---|
| Patient Demographics | 5 | age, gender, ethnicity, smoking_status, performance_status |
| Treatment Details | 4 | treatment_type, dosage, treatment_duration, prior_treatments |
| Clinical Outcomes | 3 | event_observed, survival_time, follow_up_time |
- Size: 1,200 patients with 12 features
- Target: Time until event (death) or censoring (survivor) in days
- Class balance: Event observed (death) in 35% of cases, 65% censored
- Missing data: 10% missing in treatment details, 5% missing in demographic data
Requirements
- Generate Kaplan-Meier survival curves for different treatment groups.
- Fit a Cox Proportional Hazards model to identify significant predictors of survival.
- Provide a summary of the model coefficients and their implications.
- Assess the proportional hazards assumption and report on its validity.
- Visualize the results and interpret the findings for clinical relevance.
Constraints
- Analysis must be reproducible and documented for peer review.
- The model must account for potential confounding variables, ensuring robust conclusions.