Screen Internship Candidates with ML

Business Context

TalentMatch, a recruiting platform processing ~120K internship applications per hiring cycle, wants a model to predict whether a candidate will pass the first-round technical screen. Recruiters need a practical baseline model to prioritize reviews while keeping the process explainable and fair.

Dataset

You are given historical candidate records from internships and student projects. Each row represents one application.

Feature Group	Count	Examples
Education	6	degree_level, major, GPA, graduation_year
Experience	8	internship_count, project_count, months_experience, hackathon_count
Skills	10	python, sql, sklearn, pytorch, cloud, statistics
Assessment	5	resume_score, coding_test_score, communication_score
Metadata	4	university_tier, region, referral_flag, application_source

Size: 48K applications, 33 features
Target: Binary — passed first-round screen (1) vs not passed (0)
Class balance: Moderately imbalanced, 28% positive and 72% negative
Missing data: 12% missing in GPA, 18% missing in coding_test_score, sparse skill indicators for some resumes

Success Criteria

A good solution should achieve ROC-AUC >= 0.82, F1 >= 0.68, and recall >= 0.75 for the positive class at an operational threshold. The model should also provide interpretable feature importance for recruiters.

Constraints

Batch scoring only; daily inference on up to 10K new applications
Recruiters need understandable drivers behind predictions
Training must fit on a standard CPU machine within 30 minutes
Avoid leakage from post-screen outcomes or manually added recruiter notes

Deliverables

Build a classification pipeline to predict first-round screen pass/fail.
Explain model choice versus a simpler baseline such as logistic regression.
Handle missing values, categorical variables, and class imbalance appropriately.
Evaluate with cross-validation and a held-out test set.
Report feature importance and threshold tradeoffs for recruiter operations.

Business Context

Dataset

You are given historical candidate records from internships and student projects. Each row represents one application.

Feature Group	Count	Examples
Education	6	degree_level, major, GPA, graduation_year
Experience	8	internship_count, project_count, months_experience, hackathon_count
Skills	10	python, sql, sklearn, pytorch, cloud, statistics
Assessment	5	resume_score, coding_test_score, communication_score
Metadata	4	university_tier, region, referral_flag, application_source

Size: 48K applications, 33 features
Target: Binary — passed first-round screen (1) vs not passed (0)
Class balance: Moderately imbalanced, 28% positive and 72% negative
Missing data: 12% missing in GPA, 18% missing in coding_test_score, sparse skill indicators for some resumes

Success Criteria

Constraints

Batch scoring only; daily inference on up to 10K new applications
Recruiters need understandable drivers behind predictions
Training must fit on a standard CPU machine within 30 minutes
Avoid leakage from post-screen outcomes or manually added recruiter notes

Deliverables

Build a classification pipeline to predict first-round screen pass/fail.
Explain model choice versus a simpler baseline such as logistic regression.
Handle missing values, categorical variables, and class imbalance appropriately.
Evaluate with cross-validation and a held-out test set.
Report feature importance and threshold tradeoffs for recruiter operations.

Business Context

Dataset

You are given historical candidate records from internships and student projects. Each row represents one application.

Feature Group	Count	Examples
Education	6	degree_level, major, GPA, graduation_year
Experience	8	internship_count, project_count, months_experience, hackathon_count
Skills	10	python, sql, sklearn, pytorch, cloud, statistics
Assessment	5	resume_score, coding_test_score, communication_score
Metadata	4	university_tier, region, referral_flag, application_source

Size: 48K applications, 33 features
Target: Binary — passed first-round screen (1) vs not passed (0)
Class balance: Moderately imbalanced, 28% positive and 72% negative
Missing data: 12% missing in GPA, 18% missing in coding_test_score, sparse skill indicators for some resumes

Success Criteria

Constraints

Batch scoring only; daily inference on up to 10K new applications
Recruiters need understandable drivers behind predictions
Training must fit on a standard CPU machine within 30 minutes
Avoid leakage from post-screen outcomes or manually added recruiter notes

Deliverables

Build a classification pipeline to predict first-round screen pass/fail.
Explain model choice versus a simpler baseline such as logistic regression.
Handle missing values, categorical variables, and class imbalance appropriately.
Evaluate with cross-validation and a held-out test set.
Report feature importance and threshold tradeoffs for recruiter operations.

Business Context

Dataset

You are given historical candidate records from internships and student projects. Each row represents one application.

Feature Group	Count	Examples
Education	6	degree_level, major, GPA, graduation_year
Experience	8	internship_count, project_count, months_experience, hackathon_count
Skills	10	python, sql, sklearn, pytorch, cloud, statistics
Assessment	5	resume_score, coding_test_score, communication_score
Metadata	4	university_tier, region, referral_flag, application_source

Size: 48K applications, 33 features
Target: Binary — passed first-round screen (1) vs not passed (0)
Class balance: Moderately imbalanced, 28% positive and 72% negative
Missing data: 12% missing in GPA, 18% missing in coding_test_score, sparse skill indicators for some resumes

Success Criteria

Constraints

Batch scoring only; daily inference on up to 10K new applications
Recruiters need understandable drivers behind predictions
Training must fit on a standard CPU machine within 30 minutes
Avoid leakage from post-screen outcomes or manually added recruiter notes

Deliverables

Build a classification pipeline to predict first-round screen pass/fail.
Explain model choice versus a simpler baseline such as logistic regression.
Handle missing values, categorical variables, and class imbalance appropriately.
Evaluate with cross-validation and a held-out test set.
Report feature importance and threshold tradeoffs for recruiter operations.

Interview Guides

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Screen Internship Candidates with ML

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer

Screen Internship Candidates with ML

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Screen Internship Candidates with ML

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer