Business Context
NorthStar Bank uses a credit risk model to predict whether a personal loan applicant will default within 12 months. The current logistic regression model is already in production, but risk leaders want a measurable improvement in recall and calibration without sacrificing interpretability or increasing decision latency.
Dataset
You are given a historical underwriting dataset used by the existing model.
| Feature Group | Count | Examples |
|---|
| Applicant profile | 8 | age, employment_length, annual_income, housing_status |
| Credit bureau | 10 | fico_score, delinquencies_2y, revolving_utilization, inquiries_6m |
| Loan attributes | 6 | loan_amount, term_months, interest_rate, purpose |
| Banking behavior | 7 | avg_balance_90d, overdraft_count_6m, direct_deposit_flag |
| Derived temporal features | 5 | days_since_last_delinquency, income_trend_3m |
- Rows: 240,000 loan applications from the last 24 months
- Target:
default_12m — whether the applicant defaulted within 12 months of origination
- Class balance: 11.5% default, 88.5% non-default
- Missing data: 12% missing in banking features, 6% missing in bureau variables, higher missingness for thin-file applicants
Success Criteria
A successful solution should improve the existing model by:
- Increasing PR-AUC by at least 0.05 over baseline
- Achieving recall >= 0.70 at a decision threshold where precision >= 0.35
- Producing stable, explainable feature effects suitable for model risk review
Constraints
- Inference must complete in <50 ms per application in an online underwriting API
- The final model must support reason codes or feature importance explanations
- Retraining can happen monthly, but the feature pipeline must remain simple and auditable
Deliverables
- Propose how you would improve the existing production model and justify the changes.
- Build a training pipeline that handles missing values, mixed feature types, and moderate class imbalance.
- Compare at least two model families, including the current logistic regression baseline.
- Tune the decision threshold for the stated business objective.
- Report offline metrics, feature importance, and deployment considerations.