Business Context
BrightCart, a mid-market e-commerce platform with 2 million monthly active users, wants a simple way to explain machine learning to non-technical executives. The analytics team has two active projects: predicting which customers will churn and discovering natural customer segments for marketing.
Dataset
You are given a customer analytics dataset that can support both a supervised learning task and an unsupervised learning task. Your job is to build representative models for each and prepare an explanation that a non-technical stakeholder could understand.
| Feature Group | Count | Examples |
|---|
| Behavioral | 12 | sessions_30d, avg_order_value, days_since_last_purchase, returns_rate |
| Demographic | 6 | country, device_type, acquisition_channel, loyalty_tier |
| Engagement | 5 | email_open_rate, push_click_rate, app_installs, support_tickets |
| Derived | 7 | purchase_frequency_90d, discount_share, recency_bucket, tenure_days |
| | |
- Size: 240K customers, 30 features
- Target for supervised task:
churned_60d (1 if customer made no purchase in the next 60 days, else 0)
- Unsupervised task: no target label; identify customer segments from historical behavior
- Class balance: 18% churned, 82% retained
- Missing data: 8% missing in engagement features, 3% missing in demographic fields
Success Criteria
- Build a supervised baseline that achieves ROC-AUC >= 0.78 and F1 >= 0.55 on held-out data.
- Build an unsupervised segmentation with silhouette score >= 0.20 and clear business interpretation.
- Produce a stakeholder-friendly explanation of the difference between the two approaches in plain language.
Constraints
- Explanations must be understandable to a VP-level audience.
- Batch scoring only; nightly inference on the full customer base.
- Prefer models that are reasonably interpretable and easy to maintain.
Deliverables
- Train one supervised model to predict churn.
- Train one unsupervised model to segment customers.
- Compare inputs, outputs, and evaluation methods for both approaches.
- Write a short stakeholder explanation of supervised vs. unsupervised learning.
- Recommend when the business should use each approach in production.