Business Context
FinTrust, a financial institution with over 1 million customers, aims to enhance its credit scoring model to reduce default rates and improve customer acquisition. The current model relies on logistic regression, but the team is exploring ensemble methods to improve predictive performance and robustness.
Dataset
| Feature Group | Count | Examples |
|---|
| Demographics | 10 | age, income, employment_status |
| Financial History | 15 | credit_score, outstanding_loans, payment_history |
| Transactional Data | 20 | avg_monthly_spending, transaction_count, savings_balance |
- Size: 500K records, 45 features
- Target: Binary — defaulted (1) vs not defaulted (0)
- Class balance: Imbalanced — 5% positive (defaulted), 95% negative (not defaulted)
- Missing data: 10% missing in financial history features, 3% in demographics
Requirements
- Discuss the advantages of using ensemble methods (e.g., bagging, boosting) for this credit scoring problem.
- Propose a specific ensemble method to implement, justifying your choice based on dataset characteristics.
- Define a strategy for handling the class imbalance in the dataset.
- Outline evaluation metrics that would be most relevant for assessing model performance in this context.
Constraints
- The model must be interpretable enough for compliance and regulatory reviews.
- Inference latency should not exceed 2 seconds per application.
- The solution should be cost-effective, considering computational resources.