Business Context
ShopSphere, a mid-sized e-commerce marketplace with 2.5M monthly users, wants to improve customer targeting. The growth team needs both a supervised model to predict which customers will make a purchase in the next 30 days and an unsupervised model to segment customers for lifecycle marketing.
Dataset
You are given a customer-level dataset built from 12 months of activity.
| Feature Group | Count | Examples |
|---|
| Behavioral | 12 | sessions_last_30d, avg_session_duration, pages_per_session, cart_add_rate |
| Transactional | 8 | orders_last_90d, avg_order_value, discount_usage_rate, return_rate |
| Engagement | 6 | email_open_rate, push_click_rate, days_since_last_visit |
| Demographic / Account | 5 | region, device_type, acquisition_channel, account_age_days |
| Target | 1 | purchased_next_30d |
- Size: 120K customers, 31 columns total
- Target: Binary label indicating whether the customer made at least one purchase in the following 30 days
- Class balance: 28% positive, 72% negative
- Missing data: 10% missing in engagement features, 4% missing in demographic fields
Success Criteria
A good solution should:
- Clearly explain the difference between supervised and unsupervised learning through the chosen modeling approach
- Achieve ROC-AUC >= 0.82 for purchase prediction on the held-out test set
- Produce 3-6 customer segments with interpretable behavioral patterns
- Provide a recommendation for when each approach should be used in production
Constraints
- Marketing stakeholders need interpretable outputs, not a black-box-only solution
- Batch scoring must complete in under 10 minutes for 120K customers
- The segmentation approach should be stable enough to refresh monthly
Deliverables
- Build a supervised learning pipeline for purchase prediction
- Build an unsupervised learning pipeline for customer segmentation
- Compare the goals, inputs, outputs, and evaluation of both approaches
- Explain feature engineering, preprocessing, and model selection decisions
- Report metrics and business recommendations for deployment