Dataford
Interview Guides
Upgrade
All questions/Machine Learning/Segment and Predict Retail Customers

Segment and Predict Retail Customers

Easy
Machine Learning
Supervised LearningUnsupervised LearningFeature Engineering

Problem

Business Context

ShopSphere, a mid-sized e-commerce marketplace with 2.4M registered users, wants to improve lifecycle marketing. The growth team needs both: (1) customer segments for campaign design and (2) a model to predict which customers will purchase in the next 30 days.

Dataset

You are given a customer-level dataset built from the last 12 months of activity.

Feature GroupCountExamples
Behavioral12sessions_30d, avg_session_duration, cart_add_rate, wishlist_events
Transactional10orders_90d, avg_order_value, refund_rate, days_since_last_purchase
Marketing6email_opens_30d, push_click_rate, coupon_redemptions
Customer profile7country, device_type, acquisition_channel, loyalty_tier
Target label1purchased_next_30d
  • Size: 180K customers, 35 input features
  • Target: Binary label indicating whether the customer makes at least one purchase in the next 30 days
  • Class balance: 28% positive, 72% negative
  • Missing data: 8% missing in marketing engagement fields, 3% missing in profile fields

Success Criteria

A strong solution should:

  • Build a supervised model with ROC-AUC >= 0.82 and F1 >= 0.68 on the held-out test set
  • Produce unsupervised customer segments that are stable and interpretable enough for marketing use
  • Clearly explain the difference between supervised and unsupervised learning through the chosen methods and outputs

Constraints

  • Batch scoring must finish in under 10 minutes for 180K customers
  • Marketing stakeholders need interpretable segment definitions
  • The solution should avoid leakage from future purchase behavior

Deliverables

  1. Train a supervised learning model to predict purchased_next_30d
  2. Build an unsupervised learning workflow to segment customers
  3. Compare the goals, inputs, outputs, and evaluation methods of both approaches
  4. Describe feature engineering, validation, and production deployment choices
  5. Provide code, metrics, and a short recommendation on when to use each approach

Problem

Business Context

ShopSphere, a mid-sized e-commerce marketplace with 2.4M registered users, wants to improve lifecycle marketing. The growth team needs both: (1) customer segments for campaign design and (2) a model to predict which customers will purchase in the next 30 days.

Dataset

You are given a customer-level dataset built from the last 12 months of activity.

Feature GroupCountExamples
Behavioral12sessions_30d, avg_session_duration, cart_add_rate, wishlist_events
Transactional10orders_90d, avg_order_value, refund_rate, days_since_last_purchase
Marketing6email_opens_30d, push_click_rate, coupon_redemptions
Customer profile7country, device_type, acquisition_channel, loyalty_tier
Target label1purchased_next_30d
  • Size: 180K customers, 35 input features
  • Target: Binary label indicating whether the customer makes at least one purchase in the next 30 days
  • Class balance: 28% positive, 72% negative
  • Missing data: 8% missing in marketing engagement fields, 3% missing in profile fields

Success Criteria

A strong solution should:

  • Build a supervised model with ROC-AUC >= 0.82 and F1 >= 0.68 on the held-out test set
  • Produce unsupervised customer segments that are stable and interpretable enough for marketing use
  • Clearly explain the difference between supervised and unsupervised learning through the chosen methods and outputs

Constraints

  • Batch scoring must finish in under 10 minutes for 180K customers
  • Marketing stakeholders need interpretable segment definitions
  • The solution should avoid leakage from future purchase behavior

Deliverables

  1. Train a supervised learning model to predict purchased_next_30d
  2. Build an unsupervised learning workflow to segment customers
  3. Compare the goals, inputs, outputs, and evaluation methods of both approaches
  4. Describe feature engineering, validation, and production deployment choices
  5. Provide code, metrics, and a short recommendation on when to use each approach
Your answer
Try one AI text evaluation on us
Get structured feedback, scored against a 4-axis rubric. Premium unlocks unlimited.
0 wordstarget ~200
Up next
Classify and Segment Retail CustomersEasyClassify and Segment Retail CustomersEasyClassify and Segment Retail CustomersEasy
Next question