Business Context
RetailTech, a growing e-commerce platform with 1 million active users, aims to enhance its marketing strategies by segmenting customers based on their purchasing behavior. Effective segmentation will allow targeted promotions and personalized experiences, ultimately increasing customer retention and sales.
Dataset
| Feature Group | Count | Examples |
|---|
| Transactional Data | 50K records | transaction_amount, purchase_frequency, last_purchase_date, avg_cart_value |
| Customer Info | 10 features | age, gender, location, loyalty_status |
- Size: 50,000 customer records, 15 features
- Target: Not applicable (unsupervised learning)
- Feature Types: 8 numerical (transactional metrics), 7 categorical (customer demographics)
- Class Balance: Not applicable
- Missing Data: 5% missing in last_purchase_date, 2% in avg_cart_value
Requirements
- Determine the optimal number of clusters for K-means clustering.
- Implement the K-means algorithm and visualize the clusters.
- Evaluate the clustering performance using metrics like silhouette score and elbow method.
- Provide a detailed explanation of your approach and findings.
Constraints
- The solution must be efficient enough to run on the full dataset without excessive computation time.
- Ensure that the clustering results are interpretable for marketing teams to act upon.