Business Context
Northstar Lending is building a binary classification model to predict whether a personal loan applicant will default within 12 months. The risk team wants a feature selection approach that improves generalization, reduces overfitting, and keeps the final model interpretable for compliance review.
Dataset
You are given a historical underwriting dataset with applicant, credit, and behavioral variables collected at application time.
| Feature Group | Count | Examples |
|---|
| Applicant demographics | 8 | age, employment_length, home_ownership, region |
| Financial attributes | 14 | annual_income, debt_to_income, revolving_utilization, open_credit_lines |
| Credit history | 11 | fico_band, delinquencies_2y, inquiries_6m, public_records |
| Application metadata | 7 | channel, loan_purpose, requested_amount, term_months |
| Engineered candidates | 20 | income_per_open_line, utilization_x_inquiries, log_income, missingness flags |
- Size: 120K loan applications, 60 candidate features
- Target:
default_12m — defaulted within 12 months (1) vs not defaulted (0)
- Class balance: 18% positive, 82% negative
- Missing data: 12% missing in employment and income-related fields; 3% missing in bureau variables
Success Criteria
A good solution should:
- Achieve ROC-AUC >= 0.78 on the held-out test set
- Reduce the feature set from 60 candidates to a smaller, defensible subset without materially hurting performance
- Produce a feature selection process that is reproducible and explainable to risk and compliance stakeholders
Constraints
- The final model must remain interpretable enough for model risk management review
- Training runs should finish within 30 minutes on a standard CPU machine
- Feature selection must avoid leakage and be performed only using training folds
Deliverables
- Build a baseline classification pipeline using all candidate features.
- Implement at least two feature selection techniques and compare them.
- Select a final feature set and justify why it is appropriate for this dataset.
- Report evaluation metrics on validation and test data.
- Summarize tradeoffs between predictive power, stability, and interpretability.