Data Society wants to predict whether a learner enrolled in a cohort-based training program will complete the course and earn a certificate. The model will be used in Data Society's internal learner success workflow to prioritize outreach for at-risk learners before the final project deadline.
You are given a historical dataset of learner enrollments from the last 24 months.
| Feature Group | Count | Examples |
|---|---|---|
| Demographics | 6 | region, years_experience, job_function |
| Enrollment metadata | 5 | program_track, cohort_size, funding_source, enrollment_channel |
| Engagement | 14 | sessions_attended, attendance_rate, days_since_last_login, assignments_submitted |
| Assessment | 8 | quiz_avg, project_checkpoint_score, late_submission_count |
| Support interactions | 4 | mentor_messages_sent, office_hours_attended |
| Temporal | 5 | week_of_program, enrollment_month, days_active_last_14d |
completed_certificate (1 if learner completed the program, 0 otherwise)A good solution should improve intervention targeting over a naive baseline and support operational use. Aim for ROC-AUC >= 0.82 and F1 >= 0.72 on a held-out test set, while also explaining why the selected model is more appropriate than reasonable alternatives.