Cluster Marketplace Users for Personalization

Business Context

You’re working on BazaarOne, a two-sided marketplace similar to Etsy + DoorDash for local services and goods. The platform has 12M monthly active buyers, 1.8M active sellers, and processes ~90M sessions/day across web and mobile. Leadership believes the marketplace is “one-size-fits-all”: search ranking, promotions, and seller onboarding are identical for everyone, leading to stagnant conversion and uneven liquidity (some categories are oversupplied while others have long buyer wait times).

Your task is to use clustering to create actionable segments that can drive (a) buyer personalization, (b) seller growth programs, and (c) marketplace health monitoring. The output must be stable enough to be used in downstream systems (CRM targeting, ranking features, experimentation).

Dataset

You are given 90 days of event and transaction data aggregated to a user-week grain (to reduce noise and support weekly refreshes). Buyers and sellers can overlap (a user can be both).

Feature Group	Count	Examples	Notes
Buyer behavior	18	sessions, searches, add_to_cart_rate, purchase_count, avg_order_value, category_entropy	Heavy-tailed; many zeros
Seller behavior	16	listings_active, price_median, response_time_p50, cancellation_rate, fulfillment_sla_hit_rate	Missing for non-sellers
Marketplace interactions	10	messages_sent, disputes_opened, refunds, promo_redemptions, returns_rate	Sparse, spiky
Geography & device	8	country, metro, rural_flag, device_os, app_version	Categorical
Temporal	6	week_of_year, days_since_signup, seasonality_index, holiday_week_flag	Potential leakage if not careful

Size: ~220M user-weeks (≈ 18M unique users × 12–13 weeks), 58 features after initial aggregation
Target: None (unsupervised), but you will evaluate clusters via proxy business metrics
Class balance: Not applicable; however, ~70% of user-weeks have no purchase, and ~85% have no seller activity
Missing data: Seller features missing for ~90% of users; some telemetry missing (~3–5%) due to client logging drops

Success Criteria

Your clustering solution is considered successful if it:

Produces 5–20 clusters that are interpretable and actionable (clear narratives like “high-intent bargain hunters” or “new sellers with slow response times”).
Is stable week-over-week: for returning users, at least 80% keep the same cluster label (or map cleanly) after a weekly refresh.
Demonstrates business separation: clusters show meaningful differences in at least two KPIs (e.g., conversion rate, AOV, refund rate, seller cancellation rate) with effect sizes ≥ 15–20% between top vs bottom clusters.
Can be computed in a weekly batch job with a budget of < 2 hours on a typical Spark/Databricks cluster or a single multi-core machine for a sampled training set.

Constraints

Scale: You cannot run expensive O(N^2) algorithms on 220M rows. You may train on a representative sample and then assign clusters to all users.
Mixed feature types: numerical + categorical + sparse/zero-inflated features.
Outliers: whales and power sellers dominate raw counts.
Production: clusters must be assignable for new users with partial history (cold start) and refreshed weekly.
Interpretability: product and ops teams need explanations; “cluster 7” is not acceptable without a profile.

Deliverables

Propose an end-to-end clustering approach for this marketplace (feature design, algorithm choice, scaling strategy).
Describe how you will choose the number of clusters and validate cluster quality beyond silhouette score.
Explain how you will ensure stability over time and handle cluster drift.
Provide a plan to operationalize clusters (batch scoring, monitoring, and how downstream teams use them).
Write Python code (scikit-learn) that trains on a sample, evaluates clusters with business proxies, and assigns clusters to new data.

Business Context

Dataset

You are given 90 days of event and transaction data aggregated to a user-week grain (to reduce noise and support weekly refreshes). Buyers and sellers can overlap (a user can be both).

Feature Group	Count	Examples	Notes
Buyer behavior	18	sessions, searches, add_to_cart_rate, purchase_count, avg_order_value, category_entropy	Heavy-tailed; many zeros
Seller behavior	16	listings_active, price_median, response_time_p50, cancellation_rate, fulfillment_sla_hit_rate	Missing for non-sellers
Marketplace interactions	10	messages_sent, disputes_opened, refunds, promo_redemptions, returns_rate	Sparse, spiky
Geography & device	8	country, metro, rural_flag, device_os, app_version	Categorical
Temporal	6	week_of_year, days_since_signup, seasonality_index, holiday_week_flag	Potential leakage if not careful

Size: ~220M user-weeks (≈ 18M unique users × 12–13 weeks), 58 features after initial aggregation
Target: None (unsupervised), but you will evaluate clusters via proxy business metrics
Class balance: Not applicable; however, ~70% of user-weeks have no purchase, and ~85% have no seller activity
Missing data: Seller features missing for ~90% of users; some telemetry missing (~3–5%) due to client logging drops

Success Criteria

Your clustering solution is considered successful if it:

Produces 5–20 clusters that are interpretable and actionable (clear narratives like “high-intent bargain hunters” or “new sellers with slow response times”).
Is stable week-over-week: for returning users, at least 80% keep the same cluster label (or map cleanly) after a weekly refresh.
Demonstrates business separation: clusters show meaningful differences in at least two KPIs (e.g., conversion rate, AOV, refund rate, seller cancellation rate) with effect sizes ≥ 15–20% between top vs bottom clusters.
Can be computed in a weekly batch job with a budget of < 2 hours on a typical Spark/Databricks cluster or a single multi-core machine for a sampled training set.

Constraints

Scale: You cannot run expensive O(N^2) algorithms on 220M rows. You may train on a representative sample and then assign clusters to all users.
Mixed feature types: numerical + categorical + sparse/zero-inflated features.
Outliers: whales and power sellers dominate raw counts.
Production: clusters must be assignable for new users with partial history (cold start) and refreshed weekly.
Interpretability: product and ops teams need explanations; “cluster 7” is not acceptable without a profile.

Deliverables

Propose an end-to-end clustering approach for this marketplace (feature design, algorithm choice, scaling strategy).
Describe how you will choose the number of clusters and validate cluster quality beyond silhouette score.
Explain how you will ensure stability over time and handle cluster drift.
Provide a plan to operationalize clusters (batch scoring, monitoring, and how downstream teams use them).
Write Python code (scikit-learn) that trains on a sample, evaluates clusters with business proxies, and assigns clusters to new data.

Business Context

Dataset

You are given 90 days of event and transaction data aggregated to a user-week grain (to reduce noise and support weekly refreshes). Buyers and sellers can overlap (a user can be both).

Feature Group	Count	Examples	Notes
Buyer behavior	18	sessions, searches, add_to_cart_rate, purchase_count, avg_order_value, category_entropy	Heavy-tailed; many zeros
Seller behavior	16	listings_active, price_median, response_time_p50, cancellation_rate, fulfillment_sla_hit_rate	Missing for non-sellers
Marketplace interactions	10	messages_sent, disputes_opened, refunds, promo_redemptions, returns_rate	Sparse, spiky
Geography & device	8	country, metro, rural_flag, device_os, app_version	Categorical
Temporal	6	week_of_year, days_since_signup, seasonality_index, holiday_week_flag	Potential leakage if not careful

Size: ~220M user-weeks (≈ 18M unique users × 12–13 weeks), 58 features after initial aggregation
Target: None (unsupervised), but you will evaluate clusters via proxy business metrics
Class balance: Not applicable; however, ~70% of user-weeks have no purchase, and ~85% have no seller activity
Missing data: Seller features missing for ~90% of users; some telemetry missing (~3–5%) due to client logging drops

Success Criteria

Your clustering solution is considered successful if it:

Produces 5–20 clusters that are interpretable and actionable (clear narratives like “high-intent bargain hunters” or “new sellers with slow response times”).
Is stable week-over-week: for returning users, at least 80% keep the same cluster label (or map cleanly) after a weekly refresh.
Demonstrates business separation: clusters show meaningful differences in at least two KPIs (e.g., conversion rate, AOV, refund rate, seller cancellation rate) with effect sizes ≥ 15–20% between top vs bottom clusters.
Can be computed in a weekly batch job with a budget of < 2 hours on a typical Spark/Databricks cluster or a single multi-core machine for a sampled training set.

Constraints

Scale: You cannot run expensive O(N^2) algorithms on 220M rows. You may train on a representative sample and then assign clusters to all users.
Mixed feature types: numerical + categorical + sparse/zero-inflated features.
Outliers: whales and power sellers dominate raw counts.
Production: clusters must be assignable for new users with partial history (cold start) and refreshed weekly.
Interpretability: product and ops teams need explanations; “cluster 7” is not acceptable without a profile.

Deliverables

Propose an end-to-end clustering approach for this marketplace (feature design, algorithm choice, scaling strategy).
Describe how you will choose the number of clusters and validate cluster quality beyond silhouette score.
Explain how you will ensure stability over time and handle cluster drift.
Provide a plan to operationalize clusters (batch scoring, monitoring, and how downstream teams use them).
Write Python code (scikit-learn) that trains on a sample, evaluates clusters with business proxies, and assigns clusters to new data.

Business Context

Dataset

You are given 90 days of event and transaction data aggregated to a user-week grain (to reduce noise and support weekly refreshes). Buyers and sellers can overlap (a user can be both).

Feature Group	Count	Examples	Notes
Buyer behavior	18	sessions, searches, add_to_cart_rate, purchase_count, avg_order_value, category_entropy	Heavy-tailed; many zeros
Seller behavior	16	listings_active, price_median, response_time_p50, cancellation_rate, fulfillment_sla_hit_rate	Missing for non-sellers
Marketplace interactions	10	messages_sent, disputes_opened, refunds, promo_redemptions, returns_rate	Sparse, spiky
Geography & device	8	country, metro, rural_flag, device_os, app_version	Categorical
Temporal	6	week_of_year, days_since_signup, seasonality_index, holiday_week_flag	Potential leakage if not careful

Size: ~220M user-weeks (≈ 18M unique users × 12–13 weeks), 58 features after initial aggregation
Target: None (unsupervised), but you will evaluate clusters via proxy business metrics
Class balance: Not applicable; however, ~70% of user-weeks have no purchase, and ~85% have no seller activity
Missing data: Seller features missing for ~90% of users; some telemetry missing (~3–5%) due to client logging drops

Success Criteria

Your clustering solution is considered successful if it:

Produces 5–20 clusters that are interpretable and actionable (clear narratives like “high-intent bargain hunters” or “new sellers with slow response times”).
Is stable week-over-week: for returning users, at least 80% keep the same cluster label (or map cleanly) after a weekly refresh.
Demonstrates business separation: clusters show meaningful differences in at least two KPIs (e.g., conversion rate, AOV, refund rate, seller cancellation rate) with effect sizes ≥ 15–20% between top vs bottom clusters.
Can be computed in a weekly batch job with a budget of < 2 hours on a typical Spark/Databricks cluster or a single multi-core machine for a sampled training set.

Constraints

Scale: You cannot run expensive O(N^2) algorithms on 220M rows. You may train on a representative sample and then assign clusters to all users.
Mixed feature types: numerical + categorical + sparse/zero-inflated features.
Outliers: whales and power sellers dominate raw counts.
Production: clusters must be assignable for new users with partial history (cold start) and refreshed weekly.
Interpretability: product and ops teams need explanations; “cluster 7” is not acceptable without a profile.

Deliverables

Propose an end-to-end clustering approach for this marketplace (feature design, algorithm choice, scaling strategy).
Describe how you will choose the number of clusters and validate cluster quality beyond silhouette score.
Explain how you will ensure stability over time and handle cluster drift.
Provide a plan to operationalize clusters (batch scoring, monitoring, and how downstream teams use them).
Write Python code (scikit-learn) that trains on a sample, evaluates clusters with business proxies, and assigns clusters to new data.

Interview Guides

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Cluster Marketplace Users for Personalization

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer

Cluster Marketplace Users for Personalization

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Cluster Marketplace Users for Personalization

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer