Interview Guides

Predict Insurance Policy Churn

Easy

Machine Learning

Business Context

ShieldSure, a mid-sized digital insurer with 420K active auto and home policies, wants to predict which customers will churn at renewal so the retention team can intervene before the policy lapses. The model will be used in a weekly batch workflow to prioritize outbound offers and agent follow-up.

Dataset

The training data is built at the policy-renewal opportunity level. Each row represents a policy 45 days before its renewal date, with features aggregated from the prior 12 months.

Feature Group	Count	Examples
Policy & pricing	12	premium_amount, premium_change_pct, deductible, coverage_type
Customer profile	9	tenure_months, age_band, state, bundled_products
Claims history	8	claim_count_12m, total_claim_cost_12m, recent_claim_flag
Billing & payment	7	autopay_flag, late_payment_count, payment_method
Engagement	6	app_logins_90d, email_open_rate, agent_contacts_180d
Service interactions	5	complaint_count, call_center_contacts, NPS_bucket

Size: 310K renewal records across 24 months, 47 features
Target: Churned at renewal within 45 days (1) vs renewed/retained (0)
Class balance: ~14% positive, 86% negative
Missing data: 18% missing in NPS and engagement fields, 6% missing in claims-related fields for new policyholders

Success Criteria

A good solution should identify high-risk policies early enough for retention outreach, achieve strong ranking quality, and remain interpretable for pricing and operations stakeholders. Target performance is ROC-AUC > 0.82 and recall > 70% at precision >= 35% on the holdout period.

Constraints

Weekly batch scoring of up to 500K policies must finish in under 30 minutes
The retention team needs reason codes or feature importance for flagged policies
Avoid temporal leakage from post-renewal or near-decision events
Retraining should be feasible monthly with standard Python tooling

Deliverables

Build a churn prediction pipeline for insurance policy renewals.
Explain feature engineering, leakage prevention, and model choice.
Compare at least one interpretable baseline with one stronger non-linear model.
Select a decision threshold aligned to retention team capacity.
Report evaluation metrics on a time-based test set and summarize business tradeoffs.

Predict Insurance Policy Churn

Easy

Machine Learning

Business Context

Dataset

The training data is built at the policy-renewal opportunity level. Each row represents a policy 45 days before its renewal date, with features aggregated from the prior 12 months.

Feature Group	Count	Examples
Policy & pricing	12	premium_amount, premium_change_pct, deductible, coverage_type
Customer profile	9	tenure_months, age_band, state, bundled_products
Claims history	8	claim_count_12m, total_claim_cost_12m, recent_claim_flag
Billing & payment	7	autopay_flag, late_payment_count, payment_method
Engagement	6	app_logins_90d, email_open_rate, agent_contacts_180d
Service interactions	5	complaint_count, call_center_contacts, NPS_bucket

Size: 310K renewal records across 24 months, 47 features
Target: Churned at renewal within 45 days (1) vs renewed/retained (0)
Class balance: ~14% positive, 86% negative
Missing data: 18% missing in NPS and engagement fields, 6% missing in claims-related fields for new policyholders

Success Criteria

Constraints

Weekly batch scoring of up to 500K policies must finish in under 30 minutes
The retention team needs reason codes or feature importance for flagged policies
Avoid temporal leakage from post-renewal or near-decision events
Retraining should be feasible monthly with standard Python tooling

Deliverables

Build a churn prediction pipeline for insurance policy renewals.
Explain feature engineering, leakage prevention, and model choice.
Compare at least one interpretable baseline with one stronger non-linear model.
Select a decision threshold aligned to retention team capacity.
Report evaluation metrics on a time-based test set and summarize business tradeoffs.

Your Answer

Predict Insurance Policy Churn

Easy

Machine Learning

Business Context

Dataset

The training data is built at the policy-renewal opportunity level. Each row represents a policy 45 days before its renewal date, with features aggregated from the prior 12 months.

Feature Group	Count	Examples
Policy & pricing	12	premium_amount, premium_change_pct, deductible, coverage_type
Customer profile	9	tenure_months, age_band, state, bundled_products
Claims history	8	claim_count_12m, total_claim_cost_12m, recent_claim_flag
Billing & payment	7	autopay_flag, late_payment_count, payment_method
Engagement	6	app_logins_90d, email_open_rate, agent_contacts_180d
Service interactions	5	complaint_count, call_center_contacts, NPS_bucket

Size: 310K renewal records across 24 months, 47 features
Target: Churned at renewal within 45 days (1) vs renewed/retained (0)
Class balance: ~14% positive, 86% negative
Missing data: 18% missing in NPS and engagement fields, 6% missing in claims-related fields for new policyholders

Success Criteria

Constraints

Weekly batch scoring of up to 500K policies must finish in under 30 minutes
The retention team needs reason codes or feature importance for flagged policies
Avoid temporal leakage from post-renewal or near-decision events
Retraining should be feasible monthly with standard Python tooling

Deliverables

Build a churn prediction pipeline for insurance policy renewals.
Explain feature engineering, leakage prevention, and model choice.
Compare at least one interpretable baseline with one stronger non-linear model.
Select a decision threshold aligned to retention team capacity.
Report evaluation metrics on a time-based test set and summarize business tradeoffs.

Predict Insurance Policy Churn

Easy

Machine Learning

Business Context

Dataset

The training data is built at the policy-renewal opportunity level. Each row represents a policy 45 days before its renewal date, with features aggregated from the prior 12 months.

Feature Group	Count	Examples
Policy & pricing	12	premium_amount, premium_change_pct, deductible, coverage_type
Customer profile	9	tenure_months, age_band, state, bundled_products
Claims history	8	claim_count_12m, total_claim_cost_12m, recent_claim_flag
Billing & payment	7	autopay_flag, late_payment_count, payment_method
Engagement	6	app_logins_90d, email_open_rate, agent_contacts_180d
Service interactions	5	complaint_count, call_center_contacts, NPS_bucket

Size: 310K renewal records across 24 months, 47 features
Target: Churned at renewal within 45 days (1) vs renewed/retained (0)
Class balance: ~14% positive, 86% negative
Missing data: 18% missing in NPS and engagement fields, 6% missing in claims-related fields for new policyholders

Success Criteria

Constraints

Weekly batch scoring of up to 500K policies must finish in under 30 minutes
The retention team needs reason codes or feature importance for flagged policies
Avoid temporal leakage from post-renewal or near-decision events
Retraining should be feasible monthly with standard Python tooling

Deliverables

Build a churn prediction pipeline for insurance policy renewals.
Explain feature engineering, leakage prevention, and model choice.
Compare at least one interpretable baseline with one stronger non-linear model.
Select a decision threshold aligned to retention team capacity.
Report evaluation metrics on a time-based test set and summarize business tradeoffs.