NorthStar Lending wants a credit risk model to predict whether a personal loan applicant will default within 12 months. The current model uses all available columns and is difficult to explain to risk analysts, so the team wants a feature selection approach that improves generalization while keeping the model interpretable.
| Feature Group | Count | Examples |
|---|---|---|
| Applicant demographics | 8 | age, employment_length, home_ownership, region |
| Credit bureau variables | 14 | fico_score, delinquencies_2y, revolving_utilization, inquiries_6m |
| Loan application fields | 10 | loan_amount, term_months, interest_rate, purpose |
| Banking behavior | 8 | avg_monthly_balance, overdraft_count_90d, direct_deposit_flag |
| Engineered history features | 6 | debt_to_income_trend, utilization_change_90d, payment_to_income_ratio |
A good solution should improve out-of-sample performance over a baseline logistic regression using all features, while reducing the feature set to a smaller, explainable subset. Target performance is ROC-AUC >= 0.78 and PR-AUC >= 0.42 on the holdout set with no material calibration degradation.