FinSure, a consumer lending platform processing ~250K loan applications per quarter, wants to standardize how its risk team chooses between classification and regression models. For each funded loan, the team tracks both whether the borrower defaults within 12 months and the total dollar loss if default occurs.
You are given a historical loan dataset and must build two separate supervised learning models on the same feature set:
| Feature Group | Count | Examples |
|---|---|---|
| Applicant demographics | 6 | age, employment_length, home_ownership |
| Credit history | 8 | fico_score, delinquencies_2y, revolving_utilization |
| Loan attributes | 7 | loan_amount, interest_rate, term_months, purpose |
| Income & affordability | 5 | annual_income, dti_ratio, verified_income |
| Behavioral / bureau flags | 4 | recent_inquiries, prior_defaults, bankruptcies |
default_12m: binary target (1 = default within 12 months, 0 = no default)loss_amount_usd: continuous target, highly right-skewed with many zeros for non-defaulted loansA strong solution should clearly explain the difference between classification and regression through model choice, outputs, loss functions, and evaluation metrics. The classification model should achieve ROC-AUC > 0.78 and the regression model should achieve MAE < $1,150 on the holdout set.
default_12m and one regression model for loss_amount_usd