Segment Users and Predict Churn

Business Context

StreamCart, a mid-sized subscription video platform with 2.4M monthly active users, wants to improve retention. The analytics team needs both unsupervised learning to discover natural customer segments and supervised learning to predict which users are likely to churn in the next 30 days.

Dataset

You are given a user-level dataset built from the last 12 months of activity.

Feature Group	Count	Examples
Engagement	10	weekly_watch_hours, sessions_per_week, completion_rate
Subscription	6	plan_type, tenure_days, monthly_price, auto_renew
Device & Region	5	primary_device, country, app_version
Support & Billing	5	support_tickets_90d, payment_failures_90d
Derived behavior	6	days_since_last_watch, weekend_ratio, genre_diversity

Size: 120K users, 32 features
Target for supervised task: churn_30d (1 if user cancels within 30 days, else 0)
Unsupervised task: no target label; identify meaningful user segments
Class balance: 14% churn, 86% retained
Missing data: ~8% missing in support and billing fields, ~3% missing in device metadata

Success Criteria

A strong solution should:

Build a churn classifier with ROC-AUC >= 0.84 and F1 >= 0.55 on the holdout set
Produce 3-6 interpretable user segments with clear behavioral differences
Clearly explain when supervised learning is appropriate vs when unsupervised learning is appropriate

Constraints

Predictions are generated in a nightly batch job for 120K users
Marketing needs segment definitions simple enough to act on
The retention team requires feature importance for churn predictions

Deliverables

Train one supervised model to predict churn_30d
Train one unsupervised model to segment users
Compare the goals, inputs, outputs, and evaluation of both approaches
Describe preprocessing and feature engineering choices
Report metrics and recommend how both models would be used together in production

Business Context

Dataset

You are given a user-level dataset built from the last 12 months of activity.

Feature Group	Count	Examples
Engagement	10	weekly_watch_hours, sessions_per_week, completion_rate
Subscription	6	plan_type, tenure_days, monthly_price, auto_renew
Device & Region	5	primary_device, country, app_version
Support & Billing	5	support_tickets_90d, payment_failures_90d
Derived behavior	6	days_since_last_watch, weekend_ratio, genre_diversity

Size: 120K users, 32 features
Target for supervised task: churn_30d (1 if user cancels within 30 days, else 0)
Unsupervised task: no target label; identify meaningful user segments
Class balance: 14% churn, 86% retained
Missing data: ~8% missing in support and billing fields, ~3% missing in device metadata

Success Criteria

A strong solution should:

Build a churn classifier with ROC-AUC >= 0.84 and F1 >= 0.55 on the holdout set
Produce 3-6 interpretable user segments with clear behavioral differences
Clearly explain when supervised learning is appropriate vs when unsupervised learning is appropriate

Constraints

Predictions are generated in a nightly batch job for 120K users
Marketing needs segment definitions simple enough to act on
The retention team requires feature importance for churn predictions

Deliverables

Train one supervised model to predict churn_30d
Train one unsupervised model to segment users
Compare the goals, inputs, outputs, and evaluation of both approaches
Describe preprocessing and feature engineering choices
Report metrics and recommend how both models would be used together in production

Business Context

Dataset

You are given a user-level dataset built from the last 12 months of activity.

Feature Group	Count	Examples
Engagement	10	weekly_watch_hours, sessions_per_week, completion_rate
Subscription	6	plan_type, tenure_days, monthly_price, auto_renew
Device & Region	5	primary_device, country, app_version
Support & Billing	5	support_tickets_90d, payment_failures_90d
Derived behavior	6	days_since_last_watch, weekend_ratio, genre_diversity

Size: 120K users, 32 features
Target for supervised task: churn_30d (1 if user cancels within 30 days, else 0)
Unsupervised task: no target label; identify meaningful user segments
Class balance: 14% churn, 86% retained
Missing data: ~8% missing in support and billing fields, ~3% missing in device metadata

Success Criteria

A strong solution should:

Build a churn classifier with ROC-AUC >= 0.84 and F1 >= 0.55 on the holdout set
Produce 3-6 interpretable user segments with clear behavioral differences
Clearly explain when supervised learning is appropriate vs when unsupervised learning is appropriate

Constraints

Predictions are generated in a nightly batch job for 120K users
Marketing needs segment definitions simple enough to act on
The retention team requires feature importance for churn predictions

Deliverables

Train one supervised model to predict churn_30d
Train one unsupervised model to segment users
Compare the goals, inputs, outputs, and evaluation of both approaches
Describe preprocessing and feature engineering choices
Report metrics and recommend how both models would be used together in production

Business Context

Dataset

You are given a user-level dataset built from the last 12 months of activity.

Feature Group	Count	Examples
Engagement	10	weekly_watch_hours, sessions_per_week, completion_rate
Subscription	6	plan_type, tenure_days, monthly_price, auto_renew
Device & Region	5	primary_device, country, app_version
Support & Billing	5	support_tickets_90d, payment_failures_90d
Derived behavior	6	days_since_last_watch, weekend_ratio, genre_diversity

Size: 120K users, 32 features
Target for supervised task: churn_30d (1 if user cancels within 30 days, else 0)
Unsupervised task: no target label; identify meaningful user segments
Class balance: 14% churn, 86% retained
Missing data: ~8% missing in support and billing fields, ~3% missing in device metadata

Success Criteria

A strong solution should:

Build a churn classifier with ROC-AUC >= 0.84 and F1 >= 0.55 on the holdout set
Produce 3-6 interpretable user segments with clear behavioral differences
Clearly explain when supervised learning is appropriate vs when unsupervised learning is appropriate

Constraints

Predictions are generated in a nightly batch job for 120K users
Marketing needs segment definitions simple enough to act on
The retention team requires feature importance for churn predictions

Deliverables

Train one supervised model to predict churn_30d
Train one unsupervised model to segment users
Compare the goals, inputs, outputs, and evaluation of both approaches
Describe preprocessing and feature engineering choices
Report metrics and recommend how both models would be used together in production

Interview Guides

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Segment Users and Predict Churn

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer

Segment Users and Predict Churn

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Segment Users and Predict Churn

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer