Dataford
Interview Guides
Upgrade
All questions/Machine Learning/Predict Loan Default with Logistic Regression

Predict Loan Default with Logistic Regression

Easy
Machine Learning
Asked at 1 company1Supervised LearningCross-ValidationFeature Engineering
Also asked at
BlackRock

Problem

Business Context

LendWise, a mid-size digital lending platform processing about 120K personal loan applications per quarter, wants a model to predict whether an approved borrower will default within 12 months. The risk team needs a simple, explainable model to support underwriting decisions and reduce charge-offs without materially lowering approval volume.

Dataset

The training data contains historical funded loans from the last 3 years.

Feature GroupCountExamples
Applicant demographics6age, employment_length, home_ownership, state
Credit attributes9fico_score, revolving_utilization, delinquencies_2y, inquiries_6m
Loan attributes7loan_amount, interest_rate, term_months, purpose
Banking and income6annual_income, debt_to_income, monthly_obligations, verified_income
Behavioral / history5prior_loans, prior_defaults, days_since_last_loan, autopay_enrolled
  • Rows: 240K funded loans, 33 input features
  • Target: default_12m — whether the borrower becomes 90+ days delinquent within 12 months
  • Class balance: 11.5% default, 88.5% non-default
  • Missing data: ~8% missing in employment and income verification fields; ~3% missing in revolving utilization

Success Criteria

A good solution should achieve ROC-AUC >= 0.78, recall >= 0.60 at precision >= 0.35, and provide coefficients or feature effects that the credit policy team can explain to auditors.

Constraints

  • The model must be interpretable enough for regulated lending review.
  • Batch scoring should complete in under 10 minutes for 150K applications.
  • Retraining should be feasible monthly with limited ML ops support.

Deliverables

  1. Build a binary classification model to predict 12-month default.
  2. Explain why the chosen algorithm fits this business problem.
  3. Describe preprocessing for missing values, categorical variables, and skewed numeric features.
  4. Evaluate the model using threshold-free and threshold-based metrics.
  5. Recommend a decision threshold for underwriting based on business tradeoffs.

Problem

Business Context

LendWise, a mid-size digital lending platform processing about 120K personal loan applications per quarter, wants a model to predict whether an approved borrower will default within 12 months. The risk team needs a simple, explainable model to support underwriting decisions and reduce charge-offs without materially lowering approval volume.

Dataset

The training data contains historical funded loans from the last 3 years.

Feature GroupCountExamples
Applicant demographics6age, employment_length, home_ownership, state
Credit attributes9fico_score, revolving_utilization, delinquencies_2y, inquiries_6m
Loan attributes7loan_amount, interest_rate, term_months, purpose
Banking and income6annual_income, debt_to_income, monthly_obligations, verified_income
Behavioral / history5prior_loans, prior_defaults, days_since_last_loan, autopay_enrolled
  • Rows: 240K funded loans, 33 input features
  • Target: default_12m — whether the borrower becomes 90+ days delinquent within 12 months
  • Class balance: 11.5% default, 88.5% non-default
  • Missing data: ~8% missing in employment and income verification fields; ~3% missing in revolving utilization

Success Criteria

A good solution should achieve ROC-AUC >= 0.78, recall >= 0.60 at precision >= 0.35, and provide coefficients or feature effects that the credit policy team can explain to auditors.

Constraints

  • The model must be interpretable enough for regulated lending review.
  • Batch scoring should complete in under 10 minutes for 150K applications.
  • Retraining should be feasible monthly with limited ML ops support.

Deliverables

  1. Build a binary classification model to predict 12-month default.
  2. Explain why the chosen algorithm fits this business problem.
  3. Describe preprocessing for missing values, categorical variables, and skewed numeric features.
  4. Evaluate the model using threshold-free and threshold-based metrics.
  5. Recommend a decision threshold for underwriting based on business tradeoffs.
Your answer
Try one AI text evaluation on us
Get structured feedback, scored against a 4-axis rubric. Premium unlocks unlimited.
0 wordstarget ~200
Up next
Predict Loan Default with Logistic RegressionEasyWongaPredict Loan Default for FintechEasyPredict Loan Default End-to-EndEasy
Next question