NovaBank, a digital lender processing ~120K personal loan applications per month, wants a default-risk model for pre-approval decisions. The hiring manager is less interested in raw modeling than in whether you can clearly justify why you chose a specific library and algorithm for a real production problem.
You are given a historical loan dataset built from application-time features only.
| Feature Group | Count | Examples |
|---|---|---|
| Applicant financials | 12 | annual_income, debt_to_income, revolving_utilization, existing_loans |
| Credit history | 9 | fico_band, delinquencies_12m, inquiries_6m, oldest_trade_age |
| Application details | 7 | loan_amount, term_months, purpose, channel |
| Employment & demographics | 8 | employment_length, home_ownership, region, age_band |
| Derived risk features | 6 | payment_to_income_ratio, recent_inquiry_rate, utilization_bucket |
A strong solution should achieve a holdout ROC-AUC of at least 0.78 and PR-AUC above the default-rate baseline, while remaining explainable enough for risk and compliance review. Your explanation of why you chose the library and algorithm should be specific, evidence-based, and tied to the dataset and production constraints.