LendWise, a digital consumer lending platform processing about 120K loan applications per month, has deployed a simple baseline model to predict 90-day loan default. The current model performs poorly on both training and validation data, and the risk team suspects underfitting. Your task is to diagnose the issue and improve model capacity without creating an overfitted solution.
The training data contains one row per funded loan application.
| Feature Group | Count | Examples |
|---|---|---|
| Applicant demographics | 6 | age, employment_length, home_ownership, region |
| Credit and bureau signals | 9 | fico_score, delinquencies_2y, revolving_utilization, inquiries_6m |
| Financials | 8 | annual_income, debt_to_income, monthly_obligations, loan_amount |
| Loan attributes | 5 | term_months, interest_rate, purpose, channel |
| Behavioral aggregates | 4 | prior_loans_count, prior_default_flag, avg_days_late, autopay_enrolled |
A good solution should clearly identify signs of underfitting and improve model performance materially over a weak baseline. Target at least AUC-ROC >= 0.78 and F1 >= 0.48 on the holdout set, while maintaining stable train/validation performance.