Business Context
You’re working on BazaarOne, a two-sided marketplace similar to Etsy + DoorDash for local services and goods. The platform has 12M monthly active buyers, 1.8M active sellers, and processes ~90M sessions/day across web and mobile. Leadership believes the marketplace is “one-size-fits-all”: search ranking, promotions, and seller onboarding are identical for everyone, leading to stagnant conversion and uneven liquidity (some categories are oversupplied while others have long buyer wait times).
Your task is to use clustering to create actionable segments that can drive (a) buyer personalization, (b) seller growth programs, and (c) marketplace health monitoring. The output must be stable enough to be used in downstream systems (CRM targeting, ranking features, experimentation).
Dataset
You are given 90 days of event and transaction data aggregated to a user-week grain (to reduce noise and support weekly refreshes). Buyers and sellers can overlap (a user can be both).
| Feature Group | Count | Examples | Notes |
|---|
| Buyer behavior | 18 | sessions, searches, add_to_cart_rate, purchase_count, avg_order_value, category_entropy | Heavy-tailed; many zeros |
| Seller behavior | 16 | listings_active, price_median, response_time_p50, cancellation_rate, fulfillment_sla_hit_rate | Missing for non-sellers |
| Marketplace interactions | 10 | messages_sent, disputes_opened, refunds, promo_redemptions, returns_rate | Sparse, spiky |
| Geography & device | 8 | country, metro, rural_flag, device_os, app_version | Categorical |
| Temporal | 6 | week_of_year, days_since_signup, seasonality_index, holiday_week_flag | Potential leakage if not careful |
- Size: ~220M user-weeks (≈ 18M unique users × 12–13 weeks), 58 features after initial aggregation
- Target: None (unsupervised), but you will evaluate clusters via proxy business metrics
- Class balance: Not applicable; however, ~70% of user-weeks have no purchase, and ~85% have no seller activity
- Missing data: Seller features missing for ~90% of users; some telemetry missing (~3–5%) due to client logging drops
Success Criteria
Your clustering solution is considered successful if it:
- Produces 5–20 clusters that are interpretable and actionable (clear narratives like “high-intent bargain hunters” or “new sellers with slow response times”).
- Is stable week-over-week: for returning users, at least 80% keep the same cluster label (or map cleanly) after a weekly refresh.
- Demonstrates business separation: clusters show meaningful differences in at least two KPIs (e.g., conversion rate, AOV, refund rate, seller cancellation rate) with effect sizes ≥ 15–20% between top vs bottom clusters.
- Can be computed in a weekly batch job with a budget of < 2 hours on a typical Spark/Databricks cluster or a single multi-core machine for a sampled training set.
Constraints
- Scale: You cannot run expensive O(N^2) algorithms on 220M rows. You may train on a representative sample and then assign clusters to all users.
- Mixed feature types: numerical + categorical + sparse/zero-inflated features.
- Outliers: whales and power sellers dominate raw counts.
- Production: clusters must be assignable for new users with partial history (cold start) and refreshed weekly.
- Interpretability: product and ops teams need explanations; “cluster 7” is not acceptable without a profile.
Deliverables
- Propose an end-to-end clustering approach for this marketplace (feature design, algorithm choice, scaling strategy).
- Describe how you will choose the number of clusters and validate cluster quality beyond silhouette score.
- Explain how you will ensure stability over time and handle cluster drift.
- Provide a plan to operationalize clusters (batch scoring, monitoring, and how downstream teams use them).
- Write Python code (scikit-learn) that trains on a sample, evaluates clusters with business proxies, and assigns clusters to new data.