FinNova, a digital lending platform processing roughly 25,000 personal loan applications per month, wants a simple predictive modeling workflow to estimate whether an applicant will default within 12 months. The goal is to teach core predictive modeling concepts: data preparation, model training, evaluation, and interpretation using Python.
You are given a historical loan dataset with one row per funded application.
| Feature Group | Count | Examples |
|---|---|---|
| Applicant demographics | 5 | age, employment_status, residence_type |
| Financial attributes | 8 | annual_income, debt_to_income, credit_utilization, open_credit_lines |
| Loan details | 6 | loan_amount, term_months, interest_rate, purpose |
| Credit history | 7 | fico_score, delinquencies_2y, inquiries_6m, bankruptcies |
| Behavioral / derived | 4 | income_to_loan_ratio, recent_balance_change, payment_to_income |
default_12m — whether the borrower defaulted within 12 monthsA good solution should outperform a naive majority-class baseline and produce a model that is easy to explain to risk analysts. Target performance is ROC-AUC >= 0.78 and F1 >= 0.50 on a held-out test set.