Northstar Lending, a digital consumer lender processing ~120K applications per month, wants a default-risk model for pre-approval decisions. In the take-home defense, you must not only build a classifier but also justify why your chosen algorithm is the right production choice versus simpler and more complex alternatives.
You are given a historical application dataset with one row per funded loan.
| Feature Group | Count | Examples |
|---|---|---|
| Applicant demographics | 6 | age_band, employment_type, region, housing_status |
| Financial profile | 11 | annual_income, debt_to_income, existing_loans, credit_utilization |
| Credit history | 8 | delinquency_count_12m, bureau_score, inquiries_6m, oldest_trade_age |
| Loan attributes | 5 | loan_amount, term_months, interest_rate, purpose |
| Behavioral / application | 6 | application_channel, time_on_form, document_resubmits, device_type |
default_90d — whether the borrower defaulted within 90 days of originationA strong solution should achieve ROC-AUC >= 0.82 and PR-AUC >= 0.42 on the holdout set, while providing a clear, defensible explanation of model choice, tradeoffs, and limitations.