Explain Model Complexity for Loan Risk

Business Context

Northstar Bank is building a credit risk model to predict whether a small-business loan will default within 12 months. The Head of Lending does not need mathematical proofs, but does need a clear explanation of why a more complex model can perform worse in production even if it looks better on training data.

Dataset

You are given a historical supervised learning dataset used for binary classification.

Feature Group	Count	Examples
Applicant financials	12	annual_revenue, debt_to_income, cash_reserves, profit_margin
Credit history	8	bureau_score, delinquencies_12m, credit_utilization, prior_defaults
Loan attributes	6	loan_amount, term_months, interest_rate, collateral_flag
Business profile	9	industry, years_in_business, employee_count, region
Behavioral / derived	7	recent_balance_trend, application_completion_time, document_resubmissions

Size: 240K loan applications, 42 features
Target: default_12m = 1 if the loan defaulted within 12 months, else 0
Class balance: 14% positive, 86% negative
Missing data: 10% missing in financial fields, 6% in behavioral features, categorical unknowns in bureau-related fields

Success Criteria

A strong solution should show how to explain the bias-variance tradeoff to a non-technical stakeholder using model results, not theory alone. The final recommendation should justify a model that generalizes well and achieves stable out-of-sample performance.

Constraints

The explanation must be understandable to business leaders
The chosen model should remain reasonably interpretable for risk review
Batch scoring must finish in under 10 minutes for 100K applications
Retraining is allowed monthly, not daily

Deliverables

Train at least three models with different complexity levels and compare train vs validation performance.
Demonstrate bias and variance using cross-validation results and learning curves.
Provide a stakeholder-friendly explanation of the tradeoff and recommend one model for production.
Show how regularization or pruning changes model behavior.
Report concrete evaluation metrics and explain which metric matters most for this lending use case.

Business Context

Dataset

You are given a historical supervised learning dataset used for binary classification.

Feature Group	Count	Examples
Applicant financials	12	annual_revenue, debt_to_income, cash_reserves, profit_margin
Credit history	8	bureau_score, delinquencies_12m, credit_utilization, prior_defaults
Loan attributes	6	loan_amount, term_months, interest_rate, collateral_flag
Business profile	9	industry, years_in_business, employee_count, region
Behavioral / derived	7	recent_balance_trend, application_completion_time, document_resubmissions

Size: 240K loan applications, 42 features
Target: default_12m = 1 if the loan defaulted within 12 months, else 0
Class balance: 14% positive, 86% negative
Missing data: 10% missing in financial fields, 6% in behavioral features, categorical unknowns in bureau-related fields

Success Criteria

Constraints

The explanation must be understandable to business leaders
The chosen model should remain reasonably interpretable for risk review
Batch scoring must finish in under 10 minutes for 100K applications
Retraining is allowed monthly, not daily

Deliverables

Train at least three models with different complexity levels and compare train vs validation performance.
Demonstrate bias and variance using cross-validation results and learning curves.
Provide a stakeholder-friendly explanation of the tradeoff and recommend one model for production.
Show how regularization or pruning changes model behavior.
Report concrete evaluation metrics and explain which metric matters most for this lending use case.

Business Context

Dataset

You are given a historical supervised learning dataset used for binary classification.

Feature Group	Count	Examples
Applicant financials	12	annual_revenue, debt_to_income, cash_reserves, profit_margin
Credit history	8	bureau_score, delinquencies_12m, credit_utilization, prior_defaults
Loan attributes	6	loan_amount, term_months, interest_rate, collateral_flag
Business profile	9	industry, years_in_business, employee_count, region
Behavioral / derived	7	recent_balance_trend, application_completion_time, document_resubmissions

Size: 240K loan applications, 42 features
Target: default_12m = 1 if the loan defaulted within 12 months, else 0
Class balance: 14% positive, 86% negative
Missing data: 10% missing in financial fields, 6% in behavioral features, categorical unknowns in bureau-related fields

Success Criteria

Constraints

The explanation must be understandable to business leaders
The chosen model should remain reasonably interpretable for risk review
Batch scoring must finish in under 10 minutes for 100K applications
Retraining is allowed monthly, not daily

Deliverables

Train at least three models with different complexity levels and compare train vs validation performance.
Demonstrate bias and variance using cross-validation results and learning curves.
Provide a stakeholder-friendly explanation of the tradeoff and recommend one model for production.
Show how regularization or pruning changes model behavior.
Report concrete evaluation metrics and explain which metric matters most for this lending use case.

Business Context

Dataset

You are given a historical supervised learning dataset used for binary classification.

Feature Group	Count	Examples
Applicant financials	12	annual_revenue, debt_to_income, cash_reserves, profit_margin
Credit history	8	bureau_score, delinquencies_12m, credit_utilization, prior_defaults
Loan attributes	6	loan_amount, term_months, interest_rate, collateral_flag
Business profile	9	industry, years_in_business, employee_count, region
Behavioral / derived	7	recent_balance_trend, application_completion_time, document_resubmissions

Size: 240K loan applications, 42 features
Target: default_12m = 1 if the loan defaulted within 12 months, else 0
Class balance: 14% positive, 86% negative
Missing data: 10% missing in financial fields, 6% in behavioral features, categorical unknowns in bureau-related fields

Success Criteria

Constraints

The explanation must be understandable to business leaders
The chosen model should remain reasonably interpretable for risk review
Batch scoring must finish in under 10 minutes for 100K applications
Retraining is allowed monthly, not daily

Deliverables

Train at least three models with different complexity levels and compare train vs validation performance.
Demonstrate bias and variance using cross-validation results and learning curves.
Provide a stakeholder-friendly explanation of the tradeoff and recommend one model for production.
Show how regularization or pruning changes model behavior.
Report concrete evaluation metrics and explain which metric matters most for this lending use case.

Interview Guides

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Explain Model Complexity for Loan Risk

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer

Explain Model Complexity for Loan Risk

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Explain Model Complexity for Loan Risk

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer