Prevent Loan Default Model Overfitting

Business Context

LendWise, a mid-size digital lender processing about 120K personal loan applications per month, wants a credit risk model to predict 90-day default before underwriting decisions are made. The current model performs well offline but degrades noticeably after deployment, and the risk team suspects overfitting.

Dataset

Feature Group	Count	Examples
Applicant profile	12	age, employment_length, annual_income, housing_status
Credit bureau	15	fico_score, revolving_utilization, delinquency_count, inquiries_6m
Loan attributes	8	loan_amount, interest_rate, term_months, purpose
Behavioral / derived	10	debt_to_income, credit_age_months, utilization_trend_3m

Size: 240K historical applications, 45 features
Target: Binary — defaulted within 90 days of origination (1) vs non-default (0)
Class balance: 11.5% positive, 88.5% negative
Missing data: 18% missing in employment_length, 9% in bureau variables for thin-file applicants, <2% elsewhere

Success Criteria

A strong solution should improve generalization on an unseen holdout set and clearly explain how overfitting is detected and prevented. Good enough means achieving ROC-AUC >= 0.80, PR-AUC >= 0.42, and keeping the train-test AUC gap below 0.03.

Constraints

Predictions must be returned in <50 ms per application
The risk team needs interpretable drivers for adverse action review
Retraining is allowed monthly; feature generation must work in batch and online scoring

Deliverables

Define overfitting in the context of this default prediction problem.
Build a baseline model and a less-overfit model, then compare train/validation/test performance.
Show at least three prevention methods (for example: regularization, feature selection, cross-validation, early stopping, or limiting model complexity).
Recommend a final production-ready approach and justify the tradeoffs.
Provide code to train, evaluate, and diagnose overfitting using appropriate metrics and plots/tables.

Business Context

Dataset

Feature Group	Count	Examples
Applicant profile	12	age, employment_length, annual_income, housing_status
Credit bureau	15	fico_score, revolving_utilization, delinquency_count, inquiries_6m
Loan attributes	8	loan_amount, interest_rate, term_months, purpose
Behavioral / derived	10	debt_to_income, credit_age_months, utilization_trend_3m

Size: 240K historical applications, 45 features
Target: Binary — defaulted within 90 days of origination (1) vs non-default (0)
Class balance: 11.5% positive, 88.5% negative
Missing data: 18% missing in employment_length, 9% in bureau variables for thin-file applicants, <2% elsewhere

Success Criteria

Constraints

Predictions must be returned in <50 ms per application
The risk team needs interpretable drivers for adverse action review
Retraining is allowed monthly; feature generation must work in batch and online scoring

Deliverables

Define overfitting in the context of this default prediction problem.
Build a baseline model and a less-overfit model, then compare train/validation/test performance.
Show at least three prevention methods (for example: regularization, feature selection, cross-validation, early stopping, or limiting model complexity).
Recommend a final production-ready approach and justify the tradeoffs.
Provide code to train, evaluate, and diagnose overfitting using appropriate metrics and plots/tables.

Business Context

Dataset

Feature Group	Count	Examples
Applicant profile	12	age, employment_length, annual_income, housing_status
Credit bureau	15	fico_score, revolving_utilization, delinquency_count, inquiries_6m
Loan attributes	8	loan_amount, interest_rate, term_months, purpose
Behavioral / derived	10	debt_to_income, credit_age_months, utilization_trend_3m

Size: 240K historical applications, 45 features
Target: Binary — defaulted within 90 days of origination (1) vs non-default (0)
Class balance: 11.5% positive, 88.5% negative
Missing data: 18% missing in employment_length, 9% in bureau variables for thin-file applicants, <2% elsewhere

Success Criteria

Constraints

Predictions must be returned in <50 ms per application
The risk team needs interpretable drivers for adverse action review
Retraining is allowed monthly; feature generation must work in batch and online scoring

Deliverables

Define overfitting in the context of this default prediction problem.
Build a baseline model and a less-overfit model, then compare train/validation/test performance.
Show at least three prevention methods (for example: regularization, feature selection, cross-validation, early stopping, or limiting model complexity).
Recommend a final production-ready approach and justify the tradeoffs.
Provide code to train, evaluate, and diagnose overfitting using appropriate metrics and plots/tables.

Business Context

Dataset

Feature Group	Count	Examples
Applicant profile	12	age, employment_length, annual_income, housing_status
Credit bureau	15	fico_score, revolving_utilization, delinquency_count, inquiries_6m
Loan attributes	8	loan_amount, interest_rate, term_months, purpose
Behavioral / derived	10	debt_to_income, credit_age_months, utilization_trend_3m

Size: 240K historical applications, 45 features
Target: Binary — defaulted within 90 days of origination (1) vs non-default (0)
Class balance: 11.5% positive, 88.5% negative
Missing data: 18% missing in employment_length, 9% in bureau variables for thin-file applicants, <2% elsewhere

Success Criteria

Constraints

Predictions must be returned in <50 ms per application
The risk team needs interpretable drivers for adverse action review
Retraining is allowed monthly; feature generation must work in batch and online scoring

Deliverables

Define overfitting in the context of this default prediction problem.
Build a baseline model and a less-overfit model, then compare train/validation/test performance.
Show at least three prevention methods (for example: regularization, feature selection, cross-validation, early stopping, or limiting model complexity).
Recommend a final production-ready approach and justify the tradeoffs.
Provide code to train, evaluate, and diagnose overfitting using appropriate metrics and plots/tables.

Interview Guides

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Prevent Loan Default Model Overfitting

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer

Prevent Loan Default Model Overfitting

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Prevent Loan Default Model Overfitting

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer