NovaBank is hiring a machine learning specialist for its consumer lending team. You are given a realistic credit risk dataset and asked to demonstrate deep expertise in traditional ML by building, explaining, and defending a production-ready default prediction model.
The dataset contains historical loan applications and 12-month repayment outcomes from NovaBank's unsecured personal loan product.
| Feature Group | Count | Examples |
|---|---|---|
| Applicant demographics | 6 | age, employment_status, region, years_at_address |
| Financial profile | 11 | annual_income, debt_to_income, revolving_utilization, existing_loans |
| Credit bureau signals | 9 | fico_band, delinquencies_12m, hard_inquiries_6m, credit_history_length |
| Loan attributes | 7 | loan_amount, term_months, interest_rate, purpose |
| Behavioral / derived | 5 | income_to_loan_ratio, utilization_trend, payment_to_income |
A strong solution should outperform a regularized logistic regression baseline, achieve robust ranking performance, and provide explanations suitable for model risk review. “Good enough” means ROC-AUC above 0.82, PR-AUC above 0.42, and calibrated probabilities usable for approval policy decisions.