LendWise, a digital consumer lending platform processing ~200K loan applications per quarter, wants to improve its default-risk model without increasing approval latency. The credit team wants to understand how feature engineering affects model quality, stability, and interpretability.
You are given an offline training dataset of historical loan applications and 12-month repayment outcomes.
| Feature Group | Count | Examples |
|---|---|---|
| Applicant demographics | 6 | age, employment_length, residence_type, region |
| Financial variables | 10 | annual_income, monthly_debt, credit_utilization, revolving_balance |
| Credit history | 8 | fico_band, delinquencies_2y, inquiries_6m, oldest_trade_age_months |
| Loan attributes | 5 | loan_amount, term_months, interest_rate, purpose |
| Behavioral / derived raw fields | 7 | recent_balance_change, payment_to_income_raw, open_to_buy, utilization_trend_3m |
default_12m — whether the borrower defaulted within 12 monthsA good solution should improve model performance over a raw-feature baseline by using thoughtful feature engineering, while keeping the model explainable enough for risk review. Target at least a 0.03 absolute lift in ROC-AUC or 0.05 lift in PR-AUC versus baseline logistic regression on raw inputs.