Classify Loan Default with Neural Networks

Business Context

LendFlow, a digital consumer lending platform processing roughly 200K loan applications per month, wants to improve default prediction at underwriting time. The risk team needs a neural-network-based classifier that can outperform a logistic regression baseline while remaining fast enough for batch retraining and low-latency scoring.

Dataset

The training data contains historical funded loans with borrower profile, credit bureau, application, and early repayment behavior features.

Feature Group	Count	Examples
Borrower demographics	6	age, employment_length_years, housing_status
Credit history	12	fico_score, revolving_utilization, delinquencies_2y
Application details	9	loan_amount, term_months, interest_rate, purpose
Banking and cash flow	8	monthly_income, debt_to_income, avg_balance_90d
Early payment behavior	7	first_payment_missed, autopay_enabled, days_past_due_30d

Size: 320K loans, 42 features
Target: Binary — default within 12 months of origination
Class balance: 11.4% default, 88.6% non-default
Missing data: ~9% missing in banking features, ~4% missing in employment fields, and sparse missingness in bureau attributes

Success Criteria

A strong solution should improve minority-class detection over a linear baseline and achieve:

ROC-AUC >= 0.84
PR-AUC >= 0.42
Recall >= 0.70 at precision >= 0.35

Constraints

Inference must stay under 20 ms per application in online scoring.
The model should support monthly retraining on newly booked loans.
The risk team needs enough interpretability to review top drivers at portfolio level.
The solution must avoid leakage from post-decision features.

Deliverables

Build a neural network classifier for 12-month default prediction.
Explain why a neural network is appropriate versus simpler tabular baselines.
Design preprocessing for mixed numerical and categorical features with missing values.
Define a validation strategy and threshold-selection approach aligned to credit risk.
Report model performance, calibration, and operational tradeoffs for deployment.

Business Context

Dataset

The training data contains historical funded loans with borrower profile, credit bureau, application, and early repayment behavior features.

Feature Group	Count	Examples
Borrower demographics	6	age, employment_length_years, housing_status
Credit history	12	fico_score, revolving_utilization, delinquencies_2y
Application details	9	loan_amount, term_months, interest_rate, purpose
Banking and cash flow	8	monthly_income, debt_to_income, avg_balance_90d
Early payment behavior	7	first_payment_missed, autopay_enabled, days_past_due_30d

Size: 320K loans, 42 features
Target: Binary — default within 12 months of origination
Class balance: 11.4% default, 88.6% non-default
Missing data: ~9% missing in banking features, ~4% missing in employment fields, and sparse missingness in bureau attributes

Success Criteria

A strong solution should improve minority-class detection over a linear baseline and achieve:

ROC-AUC >= 0.84
PR-AUC >= 0.42
Recall >= 0.70 at precision >= 0.35

Constraints

Inference must stay under 20 ms per application in online scoring.
The model should support monthly retraining on newly booked loans.
The risk team needs enough interpretability to review top drivers at portfolio level.
The solution must avoid leakage from post-decision features.

Deliverables

Build a neural network classifier for 12-month default prediction.
Explain why a neural network is appropriate versus simpler tabular baselines.
Design preprocessing for mixed numerical and categorical features with missing values.
Define a validation strategy and threshold-selection approach aligned to credit risk.
Report model performance, calibration, and operational tradeoffs for deployment.

Business Context

Dataset

The training data contains historical funded loans with borrower profile, credit bureau, application, and early repayment behavior features.

Feature Group	Count	Examples
Borrower demographics	6	age, employment_length_years, housing_status
Credit history	12	fico_score, revolving_utilization, delinquencies_2y
Application details	9	loan_amount, term_months, interest_rate, purpose
Banking and cash flow	8	monthly_income, debt_to_income, avg_balance_90d
Early payment behavior	7	first_payment_missed, autopay_enabled, days_past_due_30d

Size: 320K loans, 42 features
Target: Binary — default within 12 months of origination
Class balance: 11.4% default, 88.6% non-default
Missing data: ~9% missing in banking features, ~4% missing in employment fields, and sparse missingness in bureau attributes

Success Criteria

A strong solution should improve minority-class detection over a linear baseline and achieve:

ROC-AUC >= 0.84
PR-AUC >= 0.42
Recall >= 0.70 at precision >= 0.35

Constraints

Inference must stay under 20 ms per application in online scoring.
The model should support monthly retraining on newly booked loans.
The risk team needs enough interpretability to review top drivers at portfolio level.
The solution must avoid leakage from post-decision features.

Deliverables

Build a neural network classifier for 12-month default prediction.
Explain why a neural network is appropriate versus simpler tabular baselines.
Design preprocessing for mixed numerical and categorical features with missing values.
Define a validation strategy and threshold-selection approach aligned to credit risk.
Report model performance, calibration, and operational tradeoffs for deployment.

Business Context

Dataset

The training data contains historical funded loans with borrower profile, credit bureau, application, and early repayment behavior features.

Feature Group	Count	Examples
Borrower demographics	6	age, employment_length_years, housing_status
Credit history	12	fico_score, revolving_utilization, delinquencies_2y
Application details	9	loan_amount, term_months, interest_rate, purpose
Banking and cash flow	8	monthly_income, debt_to_income, avg_balance_90d
Early payment behavior	7	first_payment_missed, autopay_enabled, days_past_due_30d

Size: 320K loans, 42 features
Target: Binary — default within 12 months of origination
Class balance: 11.4% default, 88.6% non-default
Missing data: ~9% missing in banking features, ~4% missing in employment fields, and sparse missingness in bureau attributes

Success Criteria

A strong solution should improve minority-class detection over a linear baseline and achieve:

ROC-AUC >= 0.84
PR-AUC >= 0.42
Recall >= 0.70 at precision >= 0.35

Constraints

Inference must stay under 20 ms per application in online scoring.
The model should support monthly retraining on newly booked loans.
The risk team needs enough interpretability to review top drivers at portfolio level.
The solution must avoid leakage from post-decision features.

Deliverables

Build a neural network classifier for 12-month default prediction.
Explain why a neural network is appropriate versus simpler tabular baselines.
Design preprocessing for mixed numerical and categorical features with missing values.
Define a validation strategy and threshold-selection approach aligned to credit risk.
Report model performance, calibration, and operational tradeoffs for deployment.

Interview Guides

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Classify Loan Default with Neural Networks

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer

Classify Loan Default with Neural Networks

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Classify Loan Default with Neural Networks

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer