Northstar Bank is building a credit risk model to predict whether a small-business loan will default within 12 months. The Head of Lending does not need mathematical proofs, but does need a clear explanation of why a more complex model can perform worse in production even if it looks better on training data.
You are given a historical supervised learning dataset used for binary classification.
| Feature Group | Count | Examples |
|---|---|---|
| Applicant financials | 12 | annual_revenue, debt_to_income, cash_reserves, profit_margin |
| Credit history | 8 | bureau_score, delinquencies_12m, credit_utilization, prior_defaults |
| Loan attributes | 6 | loan_amount, term_months, interest_rate, collateral_flag |
| Business profile | 9 | industry, years_in_business, employee_count, region |
| Behavioral / derived | 7 | recent_balance_trend, application_completion_time, document_resubmissions |
default_12m = 1 if the loan defaulted within 12 months, else 0A strong solution should show how to explain the bias-variance tradeoff to a non-technical stakeholder using model results, not theory alone. The final recommendation should justify a model that generalizes well and achieves stable out-of-sample performance.