Business Context
Choctaw Nation of Oklahoma receives service requests across programs such as health, housing, education, and workforce support. You need to show the practical difference between supervised and unsupervised learning by building one labeled prediction system and one unlabeled segmentation workflow on the same operational dataset.
Dataset
You are given a historical dataset of service interactions from the Choctaw Nation of Oklahoma citizen services platform.
| Feature Group | Count | Examples |
|---|
| Demographics | 6 | age_band, county, veteran_status, household_size |
| Program history | 8 | prior_program_count, last_program_type, days_since_last_service |
| Request details | 7 | intake_channel, request_type, urgency_score, appointment_needed |
| Behavioral/usage | 5 | portal_logins_30d, missed_appointments_12m, document_upload_count |
| Text-derived | 4 | request_summary_length, keyword flags from intake notes |
| | |
- Size: 42K service requests, 30 features
- Labeled target available for one task:
assigned_program with 4 classes: Health, Housing, Education, Workforce
- Unlabeled task: discover natural request segments without using
assigned_program
- Missing data: 8% missing in portal usage fields, 12% missing in intake-note-derived features
Success Criteria
A strong solution should:
- Achieve macro F1 >= 0.72 on the supervised classification task
- Produce interpretable clusters with silhouette score >= 0.20 and clear operational meaning
- Clearly explain when supervised learning is appropriate vs when unsupervised learning is appropriate
Constraints
- Predictions should be fast enough for near-real-time routing in the intake workflow (<100 ms per request)
- Program staff need interpretable outputs, not a black-box-only approach
- Retraining should be simple enough for a small internal data team to maintain monthly
Deliverables
- Build a supervised learning model to predict
assigned_program.
- Build an unsupervised learning workflow to segment requests without labels.
- Compare the two approaches: objective, inputs, outputs, evaluation, and business use cases.
- Describe preprocessing choices for mixed data types and missing values.
- Recommend how Choctaw Nation of Oklahoma should use each model in production.