LendWise is building a logistic regression model to predict 90-day loan default. Before modeling, the team wants to use correlation analysis to remove weak predictors and flag redundant features.
You are given correlations between five candidate features and the binary target default_flag, plus pairwise correlations among the features. Use correlation analysis to decide which features should be kept, dropped, or reviewed further before fitting the model.
Assume the team computed Pearson correlations on a training sample of 1,200 applicants.
| Variable | Correlation with default_flag |
|---|---|
credit_utilization | 0.48 |
debt_to_income | 0.41 |
late_payments_12m | 0.37 |
annual_income | -0.09 |
months_with_bank | -0.03 |
Pairwise feature correlations:
| Feature Pair | Correlation |
|---|---|
credit_utilization, debt_to_income | 0.82 |
credit_utilization, late_payments_12m | 0.29 |
debt_to_income, late_payments_12m | 0.34 |
annual_income, debt_to_income | -0.58 |
annual_income, months_with_bank | 0.11 |
Use a two-sided significance level of . For screening, treat absolute correlation below 0.05 as operationally negligible, and treat pairwise feature correlation above 0.80 as a multicollinearity warning.
annual_income and months_with_bank are significantly correlated with the target.