Classify and Segment Retail Customers

Business Context

Northstar Retail, an e-commerce marketplace with 2.4M annual customers, wants to improve lifecycle marketing. The data science team needs both a supervised model to predict whether a customer will purchase in the next 30 days and an unsupervised model to segment customers for campaign targeting.

Dataset

You are given a customer-level dataset built from the last 12 months of activity.

Feature Group	Count	Examples
Transaction history	8	total_orders, avg_order_value, days_since_last_purchase
Engagement	6	email_open_rate, app_sessions_30d, website_visits_30d
Customer profile	5	region, acquisition_channel, loyalty_tier
Product behavior	5	categories_bought, discount_usage_rate, return_rate
Target label	1	purchased_next_30d

Size: 120K customers, 24 input features, 1 binary target
Target: purchased_next_30d where 1 indicates a purchase in the next 30 days
Class balance: 28% positive, 72% negative
Missing data: ~7% missing in engagement fields, ~3% missing in profile fields

Success Criteria

A strong solution should:

Explain clearly when to use supervised learning vs unsupervised learning
Build a purchase prediction model with ROC-AUC >= 0.82 and F1 >= 0.68 on the test set
Produce customer segments that are stable, interpretable, and useful for marketing

Constraints

Marketing needs segment definitions simple enough to explain to non-technical stakeholders
Batch scoring must finish in under 10 minutes for 120K customers
The solution should be maintainable by a small ML team using standard Python tooling

Deliverables

Explain the difference between supervised and unsupervised learning using this dataset
Train a supervised model for purchased_next_30d
Train an unsupervised clustering model for customer segmentation
Compare how the data preparation, training objective, and evaluation differ between the two approaches
Recommend how both models would be used together in production

Business Context

Dataset

You are given a customer-level dataset built from the last 12 months of activity.

Feature Group	Count	Examples
Transaction history	8	total_orders, avg_order_value, days_since_last_purchase
Engagement	6	email_open_rate, app_sessions_30d, website_visits_30d
Customer profile	5	region, acquisition_channel, loyalty_tier
Product behavior	5	categories_bought, discount_usage_rate, return_rate
Target label	1	purchased_next_30d

Size: 120K customers, 24 input features, 1 binary target
Target: purchased_next_30d where 1 indicates a purchase in the next 30 days
Class balance: 28% positive, 72% negative
Missing data: ~7% missing in engagement fields, ~3% missing in profile fields

Success Criteria

A strong solution should:

Explain clearly when to use supervised learning vs unsupervised learning
Build a purchase prediction model with ROC-AUC >= 0.82 and F1 >= 0.68 on the test set
Produce customer segments that are stable, interpretable, and useful for marketing

Constraints

Marketing needs segment definitions simple enough to explain to non-technical stakeholders
Batch scoring must finish in under 10 minutes for 120K customers
The solution should be maintainable by a small ML team using standard Python tooling

Deliverables

Explain the difference between supervised and unsupervised learning using this dataset
Train a supervised model for purchased_next_30d
Train an unsupervised clustering model for customer segmentation
Compare how the data preparation, training objective, and evaluation differ between the two approaches
Recommend how both models would be used together in production

Business Context

Dataset

You are given a customer-level dataset built from the last 12 months of activity.

Feature Group	Count	Examples
Transaction history	8	total_orders, avg_order_value, days_since_last_purchase
Engagement	6	email_open_rate, app_sessions_30d, website_visits_30d
Customer profile	5	region, acquisition_channel, loyalty_tier
Product behavior	5	categories_bought, discount_usage_rate, return_rate
Target label	1	purchased_next_30d

Size: 120K customers, 24 input features, 1 binary target
Target: purchased_next_30d where 1 indicates a purchase in the next 30 days
Class balance: 28% positive, 72% negative
Missing data: ~7% missing in engagement fields, ~3% missing in profile fields

Success Criteria

A strong solution should:

Explain clearly when to use supervised learning vs unsupervised learning
Build a purchase prediction model with ROC-AUC >= 0.82 and F1 >= 0.68 on the test set
Produce customer segments that are stable, interpretable, and useful for marketing

Constraints

Marketing needs segment definitions simple enough to explain to non-technical stakeholders
Batch scoring must finish in under 10 minutes for 120K customers
The solution should be maintainable by a small ML team using standard Python tooling

Deliverables

Explain the difference between supervised and unsupervised learning using this dataset
Train a supervised model for purchased_next_30d
Train an unsupervised clustering model for customer segmentation
Compare how the data preparation, training objective, and evaluation differ between the two approaches
Recommend how both models would be used together in production

Business Context

Dataset

You are given a customer-level dataset built from the last 12 months of activity.

Feature Group	Count	Examples
Transaction history	8	total_orders, avg_order_value, days_since_last_purchase
Engagement	6	email_open_rate, app_sessions_30d, website_visits_30d
Customer profile	5	region, acquisition_channel, loyalty_tier
Product behavior	5	categories_bought, discount_usage_rate, return_rate
Target label	1	purchased_next_30d

Size: 120K customers, 24 input features, 1 binary target
Target: purchased_next_30d where 1 indicates a purchase in the next 30 days
Class balance: 28% positive, 72% negative
Missing data: ~7% missing in engagement fields, ~3% missing in profile fields

Success Criteria

A strong solution should:

Explain clearly when to use supervised learning vs unsupervised learning
Build a purchase prediction model with ROC-AUC >= 0.82 and F1 >= 0.68 on the test set
Produce customer segments that are stable, interpretable, and useful for marketing

Constraints

Marketing needs segment definitions simple enough to explain to non-technical stakeholders
Batch scoring must finish in under 10 minutes for 120K customers
The solution should be maintainable by a small ML team using standard Python tooling

Deliverables

Explain the difference between supervised and unsupervised learning using this dataset
Train a supervised model for purchased_next_30d
Train an unsupervised clustering model for customer segmentation
Compare how the data preparation, training objective, and evaluation differ between the two approaches
Recommend how both models would be used together in production

Interview Guides

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Classify and Segment Retail Customers

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer

Classify and Segment Retail Customers

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Classify and Segment Retail Customers

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer