Defend Loan Default Model Choice

Business Context

Northstar Lending, a digital consumer lender processing ~120K applications per month, wants a default-risk model for pre-approval decisions. In the take-home defense, you must not only build a classifier but also justify why your chosen algorithm is the right production choice versus simpler and more complex alternatives.

Dataset

You are given a historical application dataset with one row per funded loan.

Feature Group	Count	Examples
Applicant demographics	6	age_band, employment_type, region, housing_status
Financial profile	11	annual_income, debt_to_income, existing_loans, credit_utilization
Credit history	8	delinquency_count_12m, bureau_score, inquiries_6m, oldest_trade_age
Loan attributes	5	loan_amount, term_months, interest_rate, purpose
Behavioral / application	6	application_channel, time_on_form, document_resubmits, device_type

Size: 240K loans, 36 features
Target: default_90d — whether the borrower defaulted within 90 days of origination
Class balance: 11.5% default, 88.5% non-default
Missing data: ~9% missing in bureau features, ~4% missing in self-reported income fields

Success Criteria

A strong solution should achieve ROC-AUC >= 0.82 and PR-AUC >= 0.42 on the holdout set, while providing a clear, defensible explanation of model choice, tradeoffs, and limitations.

Constraints

Predictions must run in <50 ms per application in an online API.
Risk and compliance stakeholders require reasonable interpretability.
Retraining should be feasible on a monthly cadence with standard Python tooling.

Deliverables

Train at least one baseline model and one final model.
Justify the final algorithm choice using performance, interpretability, and operational constraints.
Describe preprocessing and feature engineering decisions.
Evaluate with appropriate metrics for imbalanced classification.
Explain how you would defend this choice in a take-home assignment review, including why you did not choose at least two alternatives.

Business Context

Dataset

You are given a historical application dataset with one row per funded loan.

Feature Group	Count	Examples
Applicant demographics	6	age_band, employment_type, region, housing_status
Financial profile	11	annual_income, debt_to_income, existing_loans, credit_utilization
Credit history	8	delinquency_count_12m, bureau_score, inquiries_6m, oldest_trade_age
Loan attributes	5	loan_amount, term_months, interest_rate, purpose
Behavioral / application	6	application_channel, time_on_form, document_resubmits, device_type

Size: 240K loans, 36 features
Target: default_90d — whether the borrower defaulted within 90 days of origination
Class balance: 11.5% default, 88.5% non-default
Missing data: ~9% missing in bureau features, ~4% missing in self-reported income fields

Success Criteria

A strong solution should achieve ROC-AUC >= 0.82 and PR-AUC >= 0.42 on the holdout set, while providing a clear, defensible explanation of model choice, tradeoffs, and limitations.

Constraints

Predictions must run in <50 ms per application in an online API.
Risk and compliance stakeholders require reasonable interpretability.
Retraining should be feasible on a monthly cadence with standard Python tooling.

Deliverables

Train at least one baseline model and one final model.
Justify the final algorithm choice using performance, interpretability, and operational constraints.
Describe preprocessing and feature engineering decisions.
Evaluate with appropriate metrics for imbalanced classification.
Explain how you would defend this choice in a take-home assignment review, including why you did not choose at least two alternatives.

Business Context

Dataset

You are given a historical application dataset with one row per funded loan.

Feature Group	Count	Examples
Applicant demographics	6	age_band, employment_type, region, housing_status
Financial profile	11	annual_income, debt_to_income, existing_loans, credit_utilization
Credit history	8	delinquency_count_12m, bureau_score, inquiries_6m, oldest_trade_age
Loan attributes	5	loan_amount, term_months, interest_rate, purpose
Behavioral / application	6	application_channel, time_on_form, document_resubmits, device_type

Size: 240K loans, 36 features
Target: default_90d — whether the borrower defaulted within 90 days of origination
Class balance: 11.5% default, 88.5% non-default
Missing data: ~9% missing in bureau features, ~4% missing in self-reported income fields

Success Criteria

A strong solution should achieve ROC-AUC >= 0.82 and PR-AUC >= 0.42 on the holdout set, while providing a clear, defensible explanation of model choice, tradeoffs, and limitations.

Constraints

Predictions must run in <50 ms per application in an online API.
Risk and compliance stakeholders require reasonable interpretability.
Retraining should be feasible on a monthly cadence with standard Python tooling.

Deliverables

Train at least one baseline model and one final model.
Justify the final algorithm choice using performance, interpretability, and operational constraints.
Describe preprocessing and feature engineering decisions.
Evaluate with appropriate metrics for imbalanced classification.
Explain how you would defend this choice in a take-home assignment review, including why you did not choose at least two alternatives.

Business Context

Dataset

You are given a historical application dataset with one row per funded loan.

Feature Group	Count	Examples
Applicant demographics	6	age_band, employment_type, region, housing_status
Financial profile	11	annual_income, debt_to_income, existing_loans, credit_utilization
Credit history	8	delinquency_count_12m, bureau_score, inquiries_6m, oldest_trade_age
Loan attributes	5	loan_amount, term_months, interest_rate, purpose
Behavioral / application	6	application_channel, time_on_form, document_resubmits, device_type

Size: 240K loans, 36 features
Target: default_90d — whether the borrower defaulted within 90 days of origination
Class balance: 11.5% default, 88.5% non-default
Missing data: ~9% missing in bureau features, ~4% missing in self-reported income fields

Success Criteria

A strong solution should achieve ROC-AUC >= 0.82 and PR-AUC >= 0.42 on the holdout set, while providing a clear, defensible explanation of model choice, tradeoffs, and limitations.

Constraints

Predictions must run in <50 ms per application in an online API.
Risk and compliance stakeholders require reasonable interpretability.
Retraining should be feasible on a monthly cadence with standard Python tooling.

Deliverables

Train at least one baseline model and one final model.
Justify the final algorithm choice using performance, interpretability, and operational constraints.
Describe preprocessing and feature engineering decisions.
Evaluate with appropriate metrics for imbalanced classification.
Explain how you would defend this choice in a take-home assignment review, including why you did not choose at least two alternatives.

Interview Guides

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Defend Loan Default Model Choice

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer

Defend Loan Default Model Choice

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Defend Loan Default Model Choice

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer