Business Context
RetailCo, a mid-sized online retail company with 200K active customers, aims to enhance its marketing strategies by accurately segmenting customers based on their purchasing behavior. The marketing team wants to classify customers into segments for targeted promotions, using historical transaction data.
Dataset
| Feature Group | Count | Examples |
|---|
| Transaction Metrics | 15 | total_spent, purchase_frequency |
| Customer Demographics | 10 | age, gender, location |
| Behavioral Data | 8 | last_purchase_date, avg_discount_used |
- Size: 50K customer records, 33 features
- Target: Categorical — customer segments (e.g., 'high_value', 'low_value', 'new_customer')
- Class balance: Imbalanced — 5% high_value, 15% low_value, 80% new_customer
- Missing data: 10% missing in transaction metrics, 5% in demographics
Requirements
- Build and compare a Random Forest and a Decision Tree model for customer segmentation.
- Achieve at least 75% accuracy and 70% F1 score in classifying customer segments.
- Provide feature importance analysis for both models to guide marketing strategies.
- Discuss the advantages and disadvantages of using Random Forest over a Decision Tree for this dataset.
Constraints
- Models must be interpretable enough for the marketing team to understand segment characteristics.
- Inference must occur in real-time with a latency of under 200ms per customer.
- Budget constraints limit the use of extensive hyperparameter tuning techniques.