Dataford
Interview Guides
Upgrade
All questions/Machine Learning/Build a Loan Default Classifier

Build a Loan Default Classifier

Easy
Machine Learning
Supervised LearningCross-ValidationFeature Engineering

Problem

Business Context

LendWise, a digital consumer lending platform processing ~250K applications per quarter, wants a model to predict whether an approved borrower will default within 90 days of origination. The goal is to improve underwriting decisions while keeping the model explainable for risk and compliance teams.

Dataset

Feature GroupCountExamples
Applicant demographics6age, employment_status, region, housing_status
Credit bureau variables11credit_score, delinquencies_12m, total_open_accounts, utilization_rate
Application attributes8loan_amount, term_months, interest_rate, purpose
Bank transaction aggregates9avg_monthly_income, income_volatility, nsf_count_90d, avg_balance
Behavioral / derived6application_hour, device_risk_score, prior_applications_30d, fraud_flag_history
  • Rows: 320K historical loans, 40 features
  • Target: default_90d — 1 if borrower misses payments and is charged off within 90 days, else 0
  • Class balance: 11.6% positive, 88.4% negative
  • Missing data: 18% missing in bank transaction features, 6% missing in bureau variables for thin-file applicants

Success Criteria

A solution is considered good enough if it achieves ROC-AUC >= 0.82, PR-AUC >= 0.42, and recall >= 0.70 at precision >= 0.35 on a held-out test set. The candidate should also show a clear, repeatable model-building process from raw data to threshold selection.

Constraints

  • Predictions must be returned in <50 ms for online underwriting.
  • The model must support reason codes / feature importance for adverse action review.
  • Retraining is expected monthly using the latest approved-loan data.
  • Avoid leakage from post-origination variables.

Deliverables

  1. Define a general model-building process for this classification task.
  2. Build and compare at least one baseline and one stronger model.
  3. Explain preprocessing, feature engineering, validation, and leakage checks.
  4. Select evaluation metrics and a decision threshold aligned to business risk.
  5. Describe how the model would be deployed and monitored in production.

Problem

Business Context

LendWise, a digital consumer lending platform processing ~250K applications per quarter, wants a model to predict whether an approved borrower will default within 90 days of origination. The goal is to improve underwriting decisions while keeping the model explainable for risk and compliance teams.

Dataset

Feature GroupCountExamples
Applicant demographics6age, employment_status, region, housing_status
Credit bureau variables11credit_score, delinquencies_12m, total_open_accounts, utilization_rate
Application attributes8loan_amount, term_months, interest_rate, purpose
Bank transaction aggregates9avg_monthly_income, income_volatility, nsf_count_90d, avg_balance
Behavioral / derived6application_hour, device_risk_score, prior_applications_30d, fraud_flag_history
  • Rows: 320K historical loans, 40 features
  • Target: default_90d — 1 if borrower misses payments and is charged off within 90 days, else 0
  • Class balance: 11.6% positive, 88.4% negative
  • Missing data: 18% missing in bank transaction features, 6% missing in bureau variables for thin-file applicants

Success Criteria

A solution is considered good enough if it achieves ROC-AUC >= 0.82, PR-AUC >= 0.42, and recall >= 0.70 at precision >= 0.35 on a held-out test set. The candidate should also show a clear, repeatable model-building process from raw data to threshold selection.

Constraints

  • Predictions must be returned in <50 ms for online underwriting.
  • The model must support reason codes / feature importance for adverse action review.
  • Retraining is expected monthly using the latest approved-loan data.
  • Avoid leakage from post-origination variables.

Deliverables

  1. Define a general model-building process for this classification task.
  2. Build and compare at least one baseline and one stronger model.
  3. Explain preprocessing, feature engineering, validation, and leakage checks.
  4. Select evaluation metrics and a decision threshold aligned to business risk.
  5. Describe how the model would be deployed and monitored in production.
Your answer
Try one AI text evaluation on us
Get structured feedback, scored against a 4-axis rubric. Premium unlocks unlimited.
0 wordstarget ~200
Up next
Predict Loan Default from ApplicationsEasyPredict Loan Default End-to-EndEasySelect Features for Loan DefaultEasy
Next question