Model Student Success at UK

Business Context

The University of Kentucky wants to better support first-year students using data from UK Canvas, myUK, and advising systems. You need to demonstrate the practical difference between supervised and unsupervised learning by solving two related problems on the same student dataset: predicting academic risk and discovering student engagement segments.

Dataset

You are given a term-level dataset covering 24,000 undergraduate students over 6 semesters (about 96,000 student-term records) with behavioral, academic, and demographic features.

Feature Group	Count	Examples
Academic history	10	prior_gpa, credits_attempted, credits_completed, dropped_courses
LMS engagement (UK Canvas)	12	weekly_logins, assignment_submissions, discussion_posts, late_submissions
Advising & support	6	advising_visits, tutoring_sessions, holds_count, financial_aid_changes
Enrollment & demographics	8	residency_status, major_college, class_year, first_gen_flag
Temporal features	5	week_of_term aggregates, trend in logins, trend in grades

Target for supervised task: risk_flag = 1 if student ends the term with GPA < 2.0 or withdraws, else 0
Unsupervised task: no label; identify meaningful student segments for intervention design
Class balance: about 18% positive for risk_flag
Missing data: 12% missing in advising/tutoring features, 4% missing in LMS features for late-added courses

Success Criteria

A strong solution should:

Achieve ROC-AUC >= 0.82 and F1 >= 0.55 on the supervised risk model
Produce 3-6 interpretable clusters with a silhouette score >= 0.20 for the unsupervised task
Clearly explain when supervised learning is appropriate versus when unsupervised learning is more useful

Constraints

Predictions will be scored weekly in batch for about 30,000 active students
Student success staff need interpretable outputs, not a black-box-only solution
FERPA-sensitive data should be minimized in production features

Deliverables

Build a supervised classification model to predict risk_flag
Build an unsupervised clustering model to segment students
Compare the two approaches: objective, inputs, outputs, evaluation, and business use
Recommend which approach should power early-alert workflows in UK advising
Provide feature importance and cluster profiles suitable for non-technical stakeholders

Business Context

Dataset

You are given a term-level dataset covering 24,000 undergraduate students over 6 semesters (about 96,000 student-term records) with behavioral, academic, and demographic features.

Feature Group	Count	Examples
Academic history	10	prior_gpa, credits_attempted, credits_completed, dropped_courses
LMS engagement (UK Canvas)	12	weekly_logins, assignment_submissions, discussion_posts, late_submissions
Advising & support	6	advising_visits, tutoring_sessions, holds_count, financial_aid_changes
Enrollment & demographics	8	residency_status, major_college, class_year, first_gen_flag
Temporal features	5	week_of_term aggregates, trend in logins, trend in grades

Target for supervised task: risk_flag = 1 if student ends the term with GPA < 2.0 or withdraws, else 0
Unsupervised task: no label; identify meaningful student segments for intervention design
Class balance: about 18% positive for risk_flag
Missing data: 12% missing in advising/tutoring features, 4% missing in LMS features for late-added courses

Success Criteria

A strong solution should:

Achieve ROC-AUC >= 0.82 and F1 >= 0.55 on the supervised risk model
Produce 3-6 interpretable clusters with a silhouette score >= 0.20 for the unsupervised task
Clearly explain when supervised learning is appropriate versus when unsupervised learning is more useful

Constraints

Predictions will be scored weekly in batch for about 30,000 active students
Student success staff need interpretable outputs, not a black-box-only solution
FERPA-sensitive data should be minimized in production features

Deliverables

Build a supervised classification model to predict risk_flag
Build an unsupervised clustering model to segment students
Compare the two approaches: objective, inputs, outputs, evaluation, and business use
Recommend which approach should power early-alert workflows in UK advising
Provide feature importance and cluster profiles suitable for non-technical stakeholders

Business Context

Dataset

You are given a term-level dataset covering 24,000 undergraduate students over 6 semesters (about 96,000 student-term records) with behavioral, academic, and demographic features.

Feature Group	Count	Examples
Academic history	10	prior_gpa, credits_attempted, credits_completed, dropped_courses
LMS engagement (UK Canvas)	12	weekly_logins, assignment_submissions, discussion_posts, late_submissions
Advising & support	6	advising_visits, tutoring_sessions, holds_count, financial_aid_changes
Enrollment & demographics	8	residency_status, major_college, class_year, first_gen_flag
Temporal features	5	week_of_term aggregates, trend in logins, trend in grades

Target for supervised task: risk_flag = 1 if student ends the term with GPA < 2.0 or withdraws, else 0
Unsupervised task: no label; identify meaningful student segments for intervention design
Class balance: about 18% positive for risk_flag
Missing data: 12% missing in advising/tutoring features, 4% missing in LMS features for late-added courses

Success Criteria

A strong solution should:

Achieve ROC-AUC >= 0.82 and F1 >= 0.55 on the supervised risk model
Produce 3-6 interpretable clusters with a silhouette score >= 0.20 for the unsupervised task
Clearly explain when supervised learning is appropriate versus when unsupervised learning is more useful

Constraints

Predictions will be scored weekly in batch for about 30,000 active students
Student success staff need interpretable outputs, not a black-box-only solution
FERPA-sensitive data should be minimized in production features

Deliverables

Build a supervised classification model to predict risk_flag
Build an unsupervised clustering model to segment students
Compare the two approaches: objective, inputs, outputs, evaluation, and business use
Recommend which approach should power early-alert workflows in UK advising
Provide feature importance and cluster profiles suitable for non-technical stakeholders

Business Context

Dataset

You are given a term-level dataset covering 24,000 undergraduate students over 6 semesters (about 96,000 student-term records) with behavioral, academic, and demographic features.

Feature Group	Count	Examples
Academic history	10	prior_gpa, credits_attempted, credits_completed, dropped_courses
LMS engagement (UK Canvas)	12	weekly_logins, assignment_submissions, discussion_posts, late_submissions
Advising & support	6	advising_visits, tutoring_sessions, holds_count, financial_aid_changes
Enrollment & demographics	8	residency_status, major_college, class_year, first_gen_flag
Temporal features	5	week_of_term aggregates, trend in logins, trend in grades

Target for supervised task: risk_flag = 1 if student ends the term with GPA < 2.0 or withdraws, else 0
Unsupervised task: no label; identify meaningful student segments for intervention design
Class balance: about 18% positive for risk_flag
Missing data: 12% missing in advising/tutoring features, 4% missing in LMS features for late-added courses

Success Criteria

A strong solution should:

Achieve ROC-AUC >= 0.82 and F1 >= 0.55 on the supervised risk model
Produce 3-6 interpretable clusters with a silhouette score >= 0.20 for the unsupervised task
Clearly explain when supervised learning is appropriate versus when unsupervised learning is more useful

Constraints

Predictions will be scored weekly in batch for about 30,000 active students
Student success staff need interpretable outputs, not a black-box-only solution
FERPA-sensitive data should be minimized in production features

Deliverables

Build a supervised classification model to predict risk_flag
Build an unsupervised clustering model to segment students
Compare the two approaches: objective, inputs, outputs, evaluation, and business use
Recommend which approach should power early-alert workflows in UK advising
Provide feature importance and cluster profiles suitable for non-technical stakeholders

Interview Guides

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Model Student Success at UK

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer

Model Student Success at UK

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Model Student Success at UK

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer