Choose Churn Model for SaaS

Business Context

PulseBoard, a mid-market SaaS analytics company, wants a 30-day churn prediction model for 120K customer accounts. The retention team needs a model that is accurate enough to prioritize outreach but also explainable enough to justify interventions.

Dataset

You are given an account-level monthly training set built from product usage, billing, support, and account metadata.

Feature Group	Count	Examples
Usage metrics	14	weekly_active_users, report_exports_30d, sessions_per_user
Billing	6	mrr, payment_failures_90d, plan_tier
Support	5	open_tickets_30d, avg_resolution_hours, csat_score
Account profile	7	company_size, industry, region, contract_type
Temporal / derived	10	days_since_last_login, usage_change_30d, ticket_rate_per_user

Size: 120K account-month rows, 42 features, 18 months of history
Target: Binary label indicating whether the account churns in the next 30 days
Class balance: 11.5% churn, 88.5% retained
Missing data: 12% missing in CSAT, 7% missing in support features, <2% elsewhere

Success Criteria

A good solution should demonstrate whether a regularized logistic regression baseline is sufficient or whether a more complex model such as gradient boosting materially improves business outcomes. “Good enough” means achieving ROC-AUC >= 0.82, PR-AUC >= 0.42, and providing a clear recommendation based on both model quality and operational constraints.

Constraints

Predictions run nightly in batch for 120K accounts
Retention managers require interpretable drivers for each prediction
Retraining should be feasible monthly with standard Python tooling
Avoid unnecessary model complexity if gains are marginal

Deliverables

Build a strong logistic regression baseline with appropriate preprocessing.
Train at least one more complex model and compare it fairly.
Define the evaluation framework and thresholding strategy.
Explain when logistic regression should be preferred over the complex model.
Provide production-ready training and evaluation code.

Business Context

Dataset

You are given an account-level monthly training set built from product usage, billing, support, and account metadata.

Feature Group	Count	Examples
Usage metrics	14	weekly_active_users, report_exports_30d, sessions_per_user
Billing	6	mrr, payment_failures_90d, plan_tier
Support	5	open_tickets_30d, avg_resolution_hours, csat_score
Account profile	7	company_size, industry, region, contract_type
Temporal / derived	10	days_since_last_login, usage_change_30d, ticket_rate_per_user

Size: 120K account-month rows, 42 features, 18 months of history
Target: Binary label indicating whether the account churns in the next 30 days
Class balance: 11.5% churn, 88.5% retained
Missing data: 12% missing in CSAT, 7% missing in support features, <2% elsewhere

Success Criteria

Constraints

Predictions run nightly in batch for 120K accounts
Retention managers require interpretable drivers for each prediction
Retraining should be feasible monthly with standard Python tooling
Avoid unnecessary model complexity if gains are marginal

Deliverables

Build a strong logistic regression baseline with appropriate preprocessing.
Train at least one more complex model and compare it fairly.
Define the evaluation framework and thresholding strategy.
Explain when logistic regression should be preferred over the complex model.
Provide production-ready training and evaluation code.

Business Context

Dataset

You are given an account-level monthly training set built from product usage, billing, support, and account metadata.

Feature Group	Count	Examples
Usage metrics	14	weekly_active_users, report_exports_30d, sessions_per_user
Billing	6	mrr, payment_failures_90d, plan_tier
Support	5	open_tickets_30d, avg_resolution_hours, csat_score
Account profile	7	company_size, industry, region, contract_type
Temporal / derived	10	days_since_last_login, usage_change_30d, ticket_rate_per_user

Size: 120K account-month rows, 42 features, 18 months of history
Target: Binary label indicating whether the account churns in the next 30 days
Class balance: 11.5% churn, 88.5% retained
Missing data: 12% missing in CSAT, 7% missing in support features, <2% elsewhere

Success Criteria

Constraints

Predictions run nightly in batch for 120K accounts
Retention managers require interpretable drivers for each prediction
Retraining should be feasible monthly with standard Python tooling
Avoid unnecessary model complexity if gains are marginal

Deliverables

Build a strong logistic regression baseline with appropriate preprocessing.
Train at least one more complex model and compare it fairly.
Define the evaluation framework and thresholding strategy.
Explain when logistic regression should be preferred over the complex model.
Provide production-ready training and evaluation code.

Business Context

Dataset

You are given an account-level monthly training set built from product usage, billing, support, and account metadata.

Feature Group	Count	Examples
Usage metrics	14	weekly_active_users, report_exports_30d, sessions_per_user
Billing	6	mrr, payment_failures_90d, plan_tier
Support	5	open_tickets_30d, avg_resolution_hours, csat_score
Account profile	7	company_size, industry, region, contract_type
Temporal / derived	10	days_since_last_login, usage_change_30d, ticket_rate_per_user

Size: 120K account-month rows, 42 features, 18 months of history
Target: Binary label indicating whether the account churns in the next 30 days
Class balance: 11.5% churn, 88.5% retained
Missing data: 12% missing in CSAT, 7% missing in support features, <2% elsewhere

Success Criteria

Constraints

Predictions run nightly in batch for 120K accounts
Retention managers require interpretable drivers for each prediction
Retraining should be feasible monthly with standard Python tooling
Avoid unnecessary model complexity if gains are marginal

Deliverables

Build a strong logistic regression baseline with appropriate preprocessing.
Train at least one more complex model and compare it fairly.
Define the evaluation framework and thresholding strategy.
Explain when logistic regression should be preferred over the complex model.
Provide production-ready training and evaluation code.

Interview Guides

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Choose Churn Model for SaaS

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer

Choose Churn Model for SaaS

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Choose Churn Model for SaaS

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer