LendWise, a consumer lending fintech processing roughly 120K loan applications per month, wants a machine learning model to predict whether an applicant will default within 12 months of loan origination. The model will support underwriting decisions and risk-based pricing, so it must improve risk separation without creating an overly complex production system.
You are given a historical dataset of funded loans from the last 3 years.
| Feature Group | Count | Examples |
|---|---|---|
| Applicant demographics | 6 | age, employment_length, home_ownership, state |
| Credit bureau variables | 12 | fico_score, revolving_utilization, delinquencies_2y, inquiries_6m |
| Financial variables | 10 | annual_income, debt_to_income, existing_loans, monthly_obligations |
| Loan attributes | 7 | loan_amount, term_months, interest_rate, purpose |
| Behavioral / derived | 5 | income_to_loan_ratio, credit_age_months, recent_inquiry_rate |
default_12m — whether the customer becomes 90+ days past due or charged off within 12 monthsA good solution should achieve meaningful lift over a logistic regression baseline, with strong ranking quality for underwriting. Target performance is ROC-AUC >= 0.82, PR-AUC >= 0.42, and a top-decile lift > 2.5 on a held-out test set.