Optimize Loan Default Model Performance

Business Context

LendWise, a mid-size digital lender processing about 120K personal loan applications per month, wants to improve its default prediction model used during underwriting. The current logistic regression model is fast and stable, but its ranking quality is weak, causing both avoidable defaults and unnecessary declines.

Dataset

You are given a historical underwriting dataset built at application time.

Feature Group	Count	Examples
Applicant financials	12	annual_income, debt_to_income, revolving_utilization, delinquencies_2y
Credit history	10	fico_band, credit_history_length_months, inquiries_6m, public_records
Loan attributes	6	loan_amount, term_months, interest_rate_offer, purpose
Behavioral / channel	5	application_channel, device_type, session_length, referral_source
Derived features	7	income_to_loan_ratio, recent_inquiry_rate, utilization_bucket

Rows: 420K applications from the last 24 months
Target: default_12m — whether the borrower defaulted within 12 months of origination
Class balance: 11.4% positive, 88.6% negative
Missing data: 9% missing in employment length, 6% in revolving utilization, 3% in channel/device fields

Success Criteria

A good solution should improve model performance over the current logistic regression baseline by at least 5 points of ROC-AUC while keeping batch scoring under 200 ms per 1,000 applications. The risk team also wants a clear explanation of which optimizations mattered most.

Constraints

Predictions are generated in batch every 15 minutes for underwriting queues
The model must support reason-code style explanations for adverse action review
Training budget is moderate; a GPU-only solution is not required

Deliverables

Build and compare a baseline model and an optimized model for default prediction.
Show how you improve performance through preprocessing, feature engineering, and hyperparameter tuning.
Evaluate using appropriate classification metrics and threshold analysis.
Explain the optimization choices, tradeoffs, and deployment implications.
Provide production-ready Python code for training and evaluation.

Business Context

Dataset

You are given a historical underwriting dataset built at application time.

Feature Group	Count	Examples
Applicant financials	12	annual_income, debt_to_income, revolving_utilization, delinquencies_2y
Credit history	10	fico_band, credit_history_length_months, inquiries_6m, public_records
Loan attributes	6	loan_amount, term_months, interest_rate_offer, purpose
Behavioral / channel	5	application_channel, device_type, session_length, referral_source
Derived features	7	income_to_loan_ratio, recent_inquiry_rate, utilization_bucket

Rows: 420K applications from the last 24 months
Target: default_12m — whether the borrower defaulted within 12 months of origination
Class balance: 11.4% positive, 88.6% negative
Missing data: 9% missing in employment length, 6% in revolving utilization, 3% in channel/device fields

Success Criteria

Constraints

Predictions are generated in batch every 15 minutes for underwriting queues
The model must support reason-code style explanations for adverse action review
Training budget is moderate; a GPU-only solution is not required

Deliverables

Build and compare a baseline model and an optimized model for default prediction.
Show how you improve performance through preprocessing, feature engineering, and hyperparameter tuning.
Evaluate using appropriate classification metrics and threshold analysis.
Explain the optimization choices, tradeoffs, and deployment implications.
Provide production-ready Python code for training and evaluation.

Business Context

Dataset

You are given a historical underwriting dataset built at application time.

Feature Group	Count	Examples
Applicant financials	12	annual_income, debt_to_income, revolving_utilization, delinquencies_2y
Credit history	10	fico_band, credit_history_length_months, inquiries_6m, public_records
Loan attributes	6	loan_amount, term_months, interest_rate_offer, purpose
Behavioral / channel	5	application_channel, device_type, session_length, referral_source
Derived features	7	income_to_loan_ratio, recent_inquiry_rate, utilization_bucket

Rows: 420K applications from the last 24 months
Target: default_12m — whether the borrower defaulted within 12 months of origination
Class balance: 11.4% positive, 88.6% negative
Missing data: 9% missing in employment length, 6% in revolving utilization, 3% in channel/device fields

Success Criteria

Constraints

Predictions are generated in batch every 15 minutes for underwriting queues
The model must support reason-code style explanations for adverse action review
Training budget is moderate; a GPU-only solution is not required

Deliverables

Build and compare a baseline model and an optimized model for default prediction.
Show how you improve performance through preprocessing, feature engineering, and hyperparameter tuning.
Evaluate using appropriate classification metrics and threshold analysis.
Explain the optimization choices, tradeoffs, and deployment implications.
Provide production-ready Python code for training and evaluation.

Business Context

Dataset

You are given a historical underwriting dataset built at application time.

Feature Group	Count	Examples
Applicant financials	12	annual_income, debt_to_income, revolving_utilization, delinquencies_2y
Credit history	10	fico_band, credit_history_length_months, inquiries_6m, public_records
Loan attributes	6	loan_amount, term_months, interest_rate_offer, purpose
Behavioral / channel	5	application_channel, device_type, session_length, referral_source
Derived features	7	income_to_loan_ratio, recent_inquiry_rate, utilization_bucket

Rows: 420K applications from the last 24 months
Target: default_12m — whether the borrower defaulted within 12 months of origination
Class balance: 11.4% positive, 88.6% negative
Missing data: 9% missing in employment length, 6% in revolving utilization, 3% in channel/device fields

Success Criteria

Constraints

Predictions are generated in batch every 15 minutes for underwriting queues
The model must support reason-code style explanations for adverse action review
Training budget is moderate; a GPU-only solution is not required

Deliverables

Build and compare a baseline model and an optimized model for default prediction.
Show how you improve performance through preprocessing, feature engineering, and hyperparameter tuning.
Evaluate using appropriate classification metrics and threshold analysis.
Explain the optimization choices, tradeoffs, and deployment implications.
Provide production-ready Python code for training and evaluation.

Interview Guides

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Optimize Loan Default Model Performance

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer

Optimize Loan Default Model Performance

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Optimize Loan Default Model Performance

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer