Classify Tribal Services with ML

Business Context

Choctaw Nation of Oklahoma receives service requests across programs such as health, housing, education, and workforce support. You need to show the practical difference between supervised and unsupervised learning by building one labeled prediction system and one unlabeled segmentation workflow on the same operational dataset.

Dataset

You are given a historical dataset of service interactions from the Choctaw Nation of Oklahoma citizen services platform.

Feature Group	Count	Examples
Demographics	6	age_band, county, veteran_status, household_size
Program history	8	prior_program_count, last_program_type, days_since_last_service
Request details	7	intake_channel, request_type, urgency_score, appointment_needed
Behavioral/usage	5	portal_logins_30d, missed_appointments_12m, document_upload_count
Text-derived	4	request_summary_length, keyword flags from intake notes

Size: 42K service requests, 30 features
Labeled target available for one task: assigned_program with 4 classes: Health, Housing, Education, Workforce
Unlabeled task: discover natural request segments without using assigned_program
Missing data: 8% missing in portal usage fields, 12% missing in intake-note-derived features

Success Criteria

A strong solution should:

Achieve macro F1 >= 0.72 on the supervised classification task
Produce interpretable clusters with silhouette score >= 0.20 and clear operational meaning
Clearly explain when supervised learning is appropriate vs when unsupervised learning is appropriate

Constraints

Predictions should be fast enough for near-real-time routing in the intake workflow (<100 ms per request)
Program staff need interpretable outputs, not a black-box-only approach
Retraining should be simple enough for a small internal data team to maintain monthly

Deliverables

Build a supervised learning model to predict assigned_program.
Build an unsupervised learning workflow to segment requests without labels.
Compare the two approaches: objective, inputs, outputs, evaluation, and business use cases.
Describe preprocessing choices for mixed data types and missing values.
Recommend how Choctaw Nation of Oklahoma should use each model in production.

Business Context

Dataset

You are given a historical dataset of service interactions from the Choctaw Nation of Oklahoma citizen services platform.

Feature Group	Count	Examples
Demographics	6	age_band, county, veteran_status, household_size
Program history	8	prior_program_count, last_program_type, days_since_last_service
Request details	7	intake_channel, request_type, urgency_score, appointment_needed
Behavioral/usage	5	portal_logins_30d, missed_appointments_12m, document_upload_count
Text-derived	4	request_summary_length, keyword flags from intake notes

Size: 42K service requests, 30 features
Labeled target available for one task: assigned_program with 4 classes: Health, Housing, Education, Workforce
Unlabeled task: discover natural request segments without using assigned_program
Missing data: 8% missing in portal usage fields, 12% missing in intake-note-derived features

Success Criteria

A strong solution should:

Achieve macro F1 >= 0.72 on the supervised classification task
Produce interpretable clusters with silhouette score >= 0.20 and clear operational meaning
Clearly explain when supervised learning is appropriate vs when unsupervised learning is appropriate

Constraints

Predictions should be fast enough for near-real-time routing in the intake workflow (<100 ms per request)
Program staff need interpretable outputs, not a black-box-only approach
Retraining should be simple enough for a small internal data team to maintain monthly

Deliverables

Build a supervised learning model to predict assigned_program.
Build an unsupervised learning workflow to segment requests without labels.
Compare the two approaches: objective, inputs, outputs, evaluation, and business use cases.
Describe preprocessing choices for mixed data types and missing values.
Recommend how Choctaw Nation of Oklahoma should use each model in production.

Business Context

Dataset

You are given a historical dataset of service interactions from the Choctaw Nation of Oklahoma citizen services platform.

Feature Group	Count	Examples
Demographics	6	age_band, county, veteran_status, household_size
Program history	8	prior_program_count, last_program_type, days_since_last_service
Request details	7	intake_channel, request_type, urgency_score, appointment_needed
Behavioral/usage	5	portal_logins_30d, missed_appointments_12m, document_upload_count
Text-derived	4	request_summary_length, keyword flags from intake notes

Size: 42K service requests, 30 features
Labeled target available for one task: assigned_program with 4 classes: Health, Housing, Education, Workforce
Unlabeled task: discover natural request segments without using assigned_program
Missing data: 8% missing in portal usage fields, 12% missing in intake-note-derived features

Success Criteria

A strong solution should:

Achieve macro F1 >= 0.72 on the supervised classification task
Produce interpretable clusters with silhouette score >= 0.20 and clear operational meaning
Clearly explain when supervised learning is appropriate vs when unsupervised learning is appropriate

Constraints

Predictions should be fast enough for near-real-time routing in the intake workflow (<100 ms per request)
Program staff need interpretable outputs, not a black-box-only approach
Retraining should be simple enough for a small internal data team to maintain monthly

Deliverables

Build a supervised learning model to predict assigned_program.
Build an unsupervised learning workflow to segment requests without labels.
Compare the two approaches: objective, inputs, outputs, evaluation, and business use cases.
Describe preprocessing choices for mixed data types and missing values.
Recommend how Choctaw Nation of Oklahoma should use each model in production.

Business Context

Dataset

You are given a historical dataset of service interactions from the Choctaw Nation of Oklahoma citizen services platform.

Feature Group	Count	Examples
Demographics	6	age_band, county, veteran_status, household_size
Program history	8	prior_program_count, last_program_type, days_since_last_service
Request details	7	intake_channel, request_type, urgency_score, appointment_needed
Behavioral/usage	5	portal_logins_30d, missed_appointments_12m, document_upload_count
Text-derived	4	request_summary_length, keyword flags from intake notes

Size: 42K service requests, 30 features
Labeled target available for one task: assigned_program with 4 classes: Health, Housing, Education, Workforce
Unlabeled task: discover natural request segments without using assigned_program
Missing data: 8% missing in portal usage fields, 12% missing in intake-note-derived features

Success Criteria

A strong solution should:

Achieve macro F1 >= 0.72 on the supervised classification task
Produce interpretable clusters with silhouette score >= 0.20 and clear operational meaning
Clearly explain when supervised learning is appropriate vs when unsupervised learning is appropriate

Constraints

Predictions should be fast enough for near-real-time routing in the intake workflow (<100 ms per request)
Program staff need interpretable outputs, not a black-box-only approach
Retraining should be simple enough for a small internal data team to maintain monthly

Deliverables

Build a supervised learning model to predict assigned_program.
Build an unsupervised learning workflow to segment requests without labels.
Compare the two approaches: objective, inputs, outputs, evaluation, and business use cases.
Describe preprocessing choices for mixed data types and missing values.
Recommend how Choctaw Nation of Oklahoma should use each model in production.

Interview Guides

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Classify Tribal Services with ML

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer

Classify Tribal Services with ML

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Classify Tribal Services with ML

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer