Imbalanced Support Ticket Routing

Business Context

HelpHive, a SaaS customer support platform handling roughly 1.8 million tickets per year, wants to automatically classify incoming support tickets into issue categories such as billing, login, bug report, feature request, cancellation, and account security. The current rule-based system performs poorly on rare but high-priority classes, especially account security and cancellation.

Dataset

You are given a historical ticket dataset built from the first message in each support conversation.

Feature Group	Count	Examples
Text features	1 raw field	subject + first_message
Numerical metadata	11	customer_tenure_days, prior_ticket_count_90d, sentiment_score, message_length
Categorical metadata	7	plan_tier, language, channel, region, device_type
Temporal features	4	hour_of_day, day_of_week, days_since_last_ticket, month

Size: 420K labeled tickets, 23 engineered non-text features plus raw text
Target: Multiclass ticket category (8 classes)
Class balance: Highly skewed — top 2 classes account for 74% of tickets; smallest class is 1.6%
Missing data: 12% missing sentiment scores, 7% missing device_type, 3% missing tenure for migrated customers

Success Criteria

A good solution should improve minority-class detection without causing a large drop in overall precision. Target at least 0.72 macro F1, 0.88 weighted F1, and recall above 0.70 for the two rare operationally critical classes: account_security and cancellation.

Constraints

Inference must complete in <100 ms per ticket in an online API
Support operations need class-level explanations and feature importance
Retraining should be feasible on a weekly cadence with moderate cloud cost
The solution must avoid leakage from future ticket outcomes or agent actions

Deliverables

Build a multiclass classification pipeline for skewed support ticket data.
Explain how you handle class imbalance in both training and evaluation.
Design preprocessing for text, categorical, and numerical features with missing values.
Choose a validation strategy and justify it.
Recommend a deployment-ready thresholding or calibration approach for rare classes.
Report final metrics, confusion patterns, and the main tradeoffs of your design.

Business Context

Dataset

You are given a historical ticket dataset built from the first message in each support conversation.

Feature Group	Count	Examples
Text features	1 raw field	subject + first_message
Numerical metadata	11	customer_tenure_days, prior_ticket_count_90d, sentiment_score, message_length
Categorical metadata	7	plan_tier, language, channel, region, device_type
Temporal features	4	hour_of_day, day_of_week, days_since_last_ticket, month

Size: 420K labeled tickets, 23 engineered non-text features plus raw text
Target: Multiclass ticket category (8 classes)
Class balance: Highly skewed — top 2 classes account for 74% of tickets; smallest class is 1.6%
Missing data: 12% missing sentiment scores, 7% missing device_type, 3% missing tenure for migrated customers

Success Criteria

Constraints

Inference must complete in <100 ms per ticket in an online API
Support operations need class-level explanations and feature importance
Retraining should be feasible on a weekly cadence with moderate cloud cost
The solution must avoid leakage from future ticket outcomes or agent actions

Deliverables

Build a multiclass classification pipeline for skewed support ticket data.
Explain how you handle class imbalance in both training and evaluation.
Design preprocessing for text, categorical, and numerical features with missing values.
Choose a validation strategy and justify it.
Recommend a deployment-ready thresholding or calibration approach for rare classes.
Report final metrics, confusion patterns, and the main tradeoffs of your design.

Business Context

Dataset

You are given a historical ticket dataset built from the first message in each support conversation.

Feature Group	Count	Examples
Text features	1 raw field	subject + first_message
Numerical metadata	11	customer_tenure_days, prior_ticket_count_90d, sentiment_score, message_length
Categorical metadata	7	plan_tier, language, channel, region, device_type
Temporal features	4	hour_of_day, day_of_week, days_since_last_ticket, month

Size: 420K labeled tickets, 23 engineered non-text features plus raw text
Target: Multiclass ticket category (8 classes)
Class balance: Highly skewed — top 2 classes account for 74% of tickets; smallest class is 1.6%
Missing data: 12% missing sentiment scores, 7% missing device_type, 3% missing tenure for migrated customers

Success Criteria

Constraints

Inference must complete in <100 ms per ticket in an online API
Support operations need class-level explanations and feature importance
Retraining should be feasible on a weekly cadence with moderate cloud cost
The solution must avoid leakage from future ticket outcomes or agent actions

Deliverables

Build a multiclass classification pipeline for skewed support ticket data.
Explain how you handle class imbalance in both training and evaluation.
Design preprocessing for text, categorical, and numerical features with missing values.
Choose a validation strategy and justify it.
Recommend a deployment-ready thresholding or calibration approach for rare classes.
Report final metrics, confusion patterns, and the main tradeoffs of your design.

Business Context

Dataset

You are given a historical ticket dataset built from the first message in each support conversation.

Feature Group	Count	Examples
Text features	1 raw field	subject + first_message
Numerical metadata	11	customer_tenure_days, prior_ticket_count_90d, sentiment_score, message_length
Categorical metadata	7	plan_tier, language, channel, region, device_type
Temporal features	4	hour_of_day, day_of_week, days_since_last_ticket, month

Size: 420K labeled tickets, 23 engineered non-text features plus raw text
Target: Multiclass ticket category (8 classes)
Class balance: Highly skewed — top 2 classes account for 74% of tickets; smallest class is 1.6%
Missing data: 12% missing sentiment scores, 7% missing device_type, 3% missing tenure for migrated customers

Success Criteria

Constraints

Inference must complete in <100 ms per ticket in an online API
Support operations need class-level explanations and feature importance
Retraining should be feasible on a weekly cadence with moderate cloud cost
The solution must avoid leakage from future ticket outcomes or agent actions

Deliverables

Build a multiclass classification pipeline for skewed support ticket data.
Explain how you handle class imbalance in both training and evaluation.
Design preprocessing for text, categorical, and numerical features with missing values.
Choose a validation strategy and justify it.
Recommend a deployment-ready thresholding or calibration approach for rare classes.
Report final metrics, confusion patterns, and the main tradeoffs of your design.

Interview Guides

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Imbalanced Support Ticket Routing

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer

Imbalanced Support Ticket Routing

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Imbalanced Support Ticket Routing

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer