Business Context
ShopSphere, a mid-sized e-commerce marketplace with 2.4M registered users, wants to improve lifecycle marketing. The growth team needs both: (1) customer segments for campaign design and (2) a model to predict which customers will purchase in the next 30 days.
Dataset
You are given a customer-level dataset built from the last 12 months of activity.
| Feature Group | Count | Examples |
|---|
| Behavioral | 12 | sessions_30d, avg_session_duration, cart_add_rate, wishlist_events |
| Transactional | 10 | orders_90d, avg_order_value, refund_rate, days_since_last_purchase |
| Marketing | 6 | email_opens_30d, push_click_rate, coupon_redemptions |
| Customer profile | 7 | country, device_type, acquisition_channel, loyalty_tier |
| Target label | 1 | purchased_next_30d |
- Size: 180K customers, 35 input features
- Target: Binary label indicating whether the customer makes at least one purchase in the next 30 days
- Class balance: 28% positive, 72% negative
- Missing data: 8% missing in marketing engagement fields, 3% missing in profile fields
Success Criteria
A strong solution should:
- Build a supervised model with ROC-AUC >= 0.82 and F1 >= 0.68 on the held-out test set
- Produce unsupervised customer segments that are stable and interpretable enough for marketing use
- Clearly explain the difference between supervised and unsupervised learning through the chosen methods and outputs
Constraints
- Batch scoring must finish in under 10 minutes for 180K customers
- Marketing stakeholders need interpretable segment definitions
- The solution should avoid leakage from future purchase behavior
Deliverables
- Train a supervised learning model to predict
purchased_next_30d
- Build an unsupervised learning workflow to segment customers
- Compare the goals, inputs, outputs, and evaluation methods of both approaches
- Describe feature engineering, validation, and production deployment choices
- Provide code, metrics, and a short recommendation on when to use each approach