LendWise, a mid-size digital lender processing about 120K personal loan applications per month, wants to improve its default prediction model used during underwriting. The current logistic regression model is fast and stable, but its ranking quality is weak, causing both avoidable defaults and unnecessary declines.
You are given a historical underwriting dataset built at application time.
| Feature Group | Count | Examples |
|---|---|---|
| Applicant financials | 12 | annual_income, debt_to_income, revolving_utilization, delinquencies_2y |
| Credit history | 10 | fico_band, credit_history_length_months, inquiries_6m, public_records |
| Loan attributes | 6 | loan_amount, term_months, interest_rate_offer, purpose |
| Behavioral / channel | 5 | application_channel, device_type, session_length, referral_source |
| Derived features | 7 | income_to_loan_ratio, recent_inquiry_rate, utilization_bucket |
default_12m — whether the borrower defaulted within 12 months of originationA good solution should improve model performance over the current logistic regression baseline by at least 5 points of ROC-AUC while keeping batch scoring under 200 ms per 1,000 applications. The risk team also wants a clear explanation of which optimizations mattered most.