Justify Loan Default Model Choice

Business Context

NovaBank, a digital lender processing ~120K personal loan applications per month, wants a default-risk model for pre-approval decisions. The hiring manager is less interested in raw modeling than in whether you can clearly justify why you chose a specific library and algorithm for a real production problem.

Dataset

You are given a historical loan dataset built from application-time features only.

Feature Group	Count	Examples
Applicant financials	12	annual_income, debt_to_income, revolving_utilization, existing_loans
Credit history	9	fico_band, delinquencies_12m, inquiries_6m, oldest_trade_age
Application details	7	loan_amount, term_months, purpose, channel
Employment & demographics	8	employment_length, home_ownership, region, age_band
Derived risk features	6	payment_to_income_ratio, recent_inquiry_rate, utilization_bucket

The dataset contains both numerical and categorical variables.
Some categorical fields have moderate cardinality.
Missingness is concentrated in employment and bureau-derived fields.
The target is whether the applicant defaults within 12 months of origination.

Success Criteria

A strong solution should achieve a holdout ROC-AUC of at least 0.78 and PR-AUC above the default-rate baseline, while remaining explainable enough for risk and compliance review. Your explanation of why you chose the library and algorithm should be specific, evidence-based, and tied to the dataset and production constraints.

Constraints

Inference latency must stay under 50 ms per application.
The model must support reason-code style explanations.
Retraining will happen monthly.
The first production version should use the Python ecosystem already adopted by the risk team.

Deliverables

Train a baseline and a final classification model for 12-month default prediction.
Explain why your chosen library and algorithm are appropriate versus at least one alternative.
Show preprocessing, validation, and threshold selection.
Report evaluation metrics on a held-out test set.
Provide a short production-oriented justification covering interpretability, latency, and maintainability.

Business Context

Dataset

You are given a historical loan dataset built from application-time features only.

Feature Group	Count	Examples
Applicant financials	12	annual_income, debt_to_income, revolving_utilization, existing_loans
Credit history	9	fico_band, delinquencies_12m, inquiries_6m, oldest_trade_age
Application details	7	loan_amount, term_months, purpose, channel
Employment & demographics	8	employment_length, home_ownership, region, age_band
Derived risk features	6	payment_to_income_ratio, recent_inquiry_rate, utilization_bucket

The dataset contains both numerical and categorical variables.
Some categorical fields have moderate cardinality.
Missingness is concentrated in employment and bureau-derived fields.
The target is whether the applicant defaults within 12 months of origination.

Success Criteria

Constraints

Inference latency must stay under 50 ms per application.
The model must support reason-code style explanations.
Retraining will happen monthly.
The first production version should use the Python ecosystem already adopted by the risk team.

Deliverables

Train a baseline and a final classification model for 12-month default prediction.
Explain why your chosen library and algorithm are appropriate versus at least one alternative.
Show preprocessing, validation, and threshold selection.
Report evaluation metrics on a held-out test set.
Provide a short production-oriented justification covering interpretability, latency, and maintainability.

Business Context

Dataset

You are given a historical loan dataset built from application-time features only.

Feature Group	Count	Examples
Applicant financials	12	annual_income, debt_to_income, revolving_utilization, existing_loans
Credit history	9	fico_band, delinquencies_12m, inquiries_6m, oldest_trade_age
Application details	7	loan_amount, term_months, purpose, channel
Employment & demographics	8	employment_length, home_ownership, region, age_band
Derived risk features	6	payment_to_income_ratio, recent_inquiry_rate, utilization_bucket

The dataset contains both numerical and categorical variables.
Some categorical fields have moderate cardinality.
Missingness is concentrated in employment and bureau-derived fields.
The target is whether the applicant defaults within 12 months of origination.

Success Criteria

Constraints

Inference latency must stay under 50 ms per application.
The model must support reason-code style explanations.
Retraining will happen monthly.
The first production version should use the Python ecosystem already adopted by the risk team.

Deliverables

Train a baseline and a final classification model for 12-month default prediction.
Explain why your chosen library and algorithm are appropriate versus at least one alternative.
Show preprocessing, validation, and threshold selection.
Report evaluation metrics on a held-out test set.
Provide a short production-oriented justification covering interpretability, latency, and maintainability.

Business Context

Dataset

You are given a historical loan dataset built from application-time features only.

Feature Group	Count	Examples
Applicant financials	12	annual_income, debt_to_income, revolving_utilization, existing_loans
Credit history	9	fico_band, delinquencies_12m, inquiries_6m, oldest_trade_age
Application details	7	loan_amount, term_months, purpose, channel
Employment & demographics	8	employment_length, home_ownership, region, age_band
Derived risk features	6	payment_to_income_ratio, recent_inquiry_rate, utilization_bucket

The dataset contains both numerical and categorical variables.
Some categorical fields have moderate cardinality.
Missingness is concentrated in employment and bureau-derived fields.
The target is whether the applicant defaults within 12 months of origination.

Success Criteria

Constraints

Inference latency must stay under 50 ms per application.
The model must support reason-code style explanations.
Retraining will happen monthly.
The first production version should use the Python ecosystem already adopted by the risk team.

Deliverables

Train a baseline and a final classification model for 12-month default prediction.
Explain why your chosen library and algorithm are appropriate versus at least one alternative.
Show preprocessing, validation, and threshold selection.
Report evaluation metrics on a held-out test set.
Provide a short production-oriented justification covering interpretability, latency, and maintainability.

Interview Guides

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Justify Loan Default Model Choice

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer

Justify Loan Default Model Choice

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Justify Loan Default Model Choice

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer