Segment and Predict Retail Customers

Business Context

ShopSphere, a mid-sized e-commerce marketplace with 2.4M registered users, wants to improve lifecycle marketing. The growth team needs both: (1) customer segments for campaign design and (2) a model to predict which customers will purchase in the next 30 days.

Dataset

You are given a customer-level dataset built from the last 12 months of activity.

Feature Group	Count	Examples
Behavioral	12	sessions_30d, avg_session_duration, cart_add_rate, wishlist_events
Transactional	10	orders_90d, avg_order_value, refund_rate, days_since_last_purchase
Marketing	6	email_opens_30d, push_click_rate, coupon_redemptions
Customer profile	7	country, device_type, acquisition_channel, loyalty_tier
Target label	1	purchased_next_30d

Size: 180K customers, 35 input features
Target: Binary label indicating whether the customer makes at least one purchase in the next 30 days
Class balance: 28% positive, 72% negative
Missing data: 8% missing in marketing engagement fields, 3% missing in profile fields

Success Criteria

A strong solution should:

Build a supervised model with ROC-AUC >= 0.82 and F1 >= 0.68 on the held-out test set
Produce unsupervised customer segments that are stable and interpretable enough for marketing use
Clearly explain the difference between supervised and unsupervised learning through the chosen methods and outputs

Constraints

Batch scoring must finish in under 10 minutes for 180K customers
Marketing stakeholders need interpretable segment definitions
The solution should avoid leakage from future purchase behavior

Deliverables

Train a supervised learning model to predict purchased_next_30d
Build an unsupervised learning workflow to segment customers
Compare the goals, inputs, outputs, and evaluation methods of both approaches
Describe feature engineering, validation, and production deployment choices
Provide code, metrics, and a short recommendation on when to use each approach

Business Context

Dataset

You are given a customer-level dataset built from the last 12 months of activity.

Feature Group	Count	Examples
Behavioral	12	sessions_30d, avg_session_duration, cart_add_rate, wishlist_events
Transactional	10	orders_90d, avg_order_value, refund_rate, days_since_last_purchase
Marketing	6	email_opens_30d, push_click_rate, coupon_redemptions
Customer profile	7	country, device_type, acquisition_channel, loyalty_tier
Target label	1	purchased_next_30d

Size: 180K customers, 35 input features
Target: Binary label indicating whether the customer makes at least one purchase in the next 30 days
Class balance: 28% positive, 72% negative
Missing data: 8% missing in marketing engagement fields, 3% missing in profile fields

Success Criteria

A strong solution should:

Build a supervised model with ROC-AUC >= 0.82 and F1 >= 0.68 on the held-out test set
Produce unsupervised customer segments that are stable and interpretable enough for marketing use
Clearly explain the difference between supervised and unsupervised learning through the chosen methods and outputs

Constraints

Batch scoring must finish in under 10 minutes for 180K customers
Marketing stakeholders need interpretable segment definitions
The solution should avoid leakage from future purchase behavior

Deliverables

Train a supervised learning model to predict purchased_next_30d
Build an unsupervised learning workflow to segment customers
Compare the goals, inputs, outputs, and evaluation methods of both approaches
Describe feature engineering, validation, and production deployment choices
Provide code, metrics, and a short recommendation on when to use each approach

Business Context

Dataset

You are given a customer-level dataset built from the last 12 months of activity.

Feature Group	Count	Examples
Behavioral	12	sessions_30d, avg_session_duration, cart_add_rate, wishlist_events
Transactional	10	orders_90d, avg_order_value, refund_rate, days_since_last_purchase
Marketing	6	email_opens_30d, push_click_rate, coupon_redemptions
Customer profile	7	country, device_type, acquisition_channel, loyalty_tier
Target label	1	purchased_next_30d

Size: 180K customers, 35 input features
Target: Binary label indicating whether the customer makes at least one purchase in the next 30 days
Class balance: 28% positive, 72% negative
Missing data: 8% missing in marketing engagement fields, 3% missing in profile fields

Success Criteria

A strong solution should:

Build a supervised model with ROC-AUC >= 0.82 and F1 >= 0.68 on the held-out test set
Produce unsupervised customer segments that are stable and interpretable enough for marketing use
Clearly explain the difference between supervised and unsupervised learning through the chosen methods and outputs

Constraints

Batch scoring must finish in under 10 minutes for 180K customers
Marketing stakeholders need interpretable segment definitions
The solution should avoid leakage from future purchase behavior

Deliverables

Train a supervised learning model to predict purchased_next_30d
Build an unsupervised learning workflow to segment customers
Compare the goals, inputs, outputs, and evaluation methods of both approaches
Describe feature engineering, validation, and production deployment choices
Provide code, metrics, and a short recommendation on when to use each approach

Business Context

Dataset

You are given a customer-level dataset built from the last 12 months of activity.

Feature Group	Count	Examples
Behavioral	12	sessions_30d, avg_session_duration, cart_add_rate, wishlist_events
Transactional	10	orders_90d, avg_order_value, refund_rate, days_since_last_purchase
Marketing	6	email_opens_30d, push_click_rate, coupon_redemptions
Customer profile	7	country, device_type, acquisition_channel, loyalty_tier
Target label	1	purchased_next_30d

Size: 180K customers, 35 input features
Target: Binary label indicating whether the customer makes at least one purchase in the next 30 days
Class balance: 28% positive, 72% negative
Missing data: 8% missing in marketing engagement fields, 3% missing in profile fields

Success Criteria

A strong solution should:

Build a supervised model with ROC-AUC >= 0.82 and F1 >= 0.68 on the held-out test set
Produce unsupervised customer segments that are stable and interpretable enough for marketing use
Clearly explain the difference between supervised and unsupervised learning through the chosen methods and outputs

Constraints

Batch scoring must finish in under 10 minutes for 180K customers
Marketing stakeholders need interpretable segment definitions
The solution should avoid leakage from future purchase behavior

Deliverables

Train a supervised learning model to predict purchased_next_30d
Build an unsupervised learning workflow to segment customers
Compare the goals, inputs, outputs, and evaluation methods of both approaches
Describe feature engineering, validation, and production deployment choices
Provide code, metrics, and a short recommendation on when to use each approach

Interview Guides

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Segment and Predict Retail Customers

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer

Segment and Predict Retail Customers

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Segment and Predict Retail Customers

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer