Northstar Bank uses a binary classification model to predict whether a new loan applicant will default within 12 months. The current notebook-based model works offline on sampled data, but the lending platform now needs a production-ready approach that can score 1.5M applications per month with stable latency, reproducible training, and measurable business impact.
| Feature Group | Count | Examples |
|---|---|---|
| Applicant profile | 12 | age, employment_length, annual_income, education_level |
| Credit bureau | 15 | credit_score, revolving_utilization, delinquencies_12m, inquiries_6m |
| Loan attributes | 8 | loan_amount, term_months, interest_rate, channel |
| Behavioral / derived | 10 | debt_to_income, income_to_loan_ratio, recent_credit_velocity |
default_12m = 1 if the borrower defaults within 12 months of originationA good solution should improve ranking quality over the current logistic regression baseline, support batch and low-latency online inference, and provide enough transparency for risk and compliance review. Target online p95 inference latency is under 50 ms per request, with monthly retraining and full reproducibility.