Compare Transformer Models for Ticket Routing

Business Context

ZendeskX routes customer support tickets for a SaaS platform into product, billing, account, and technical queues. The team currently uses an LSTM baseline, but ticket volume has grown and long messages with multiple issues are being misrouted, so they want you to evaluate whether a Transformer-based approach is a better production choice than RNN- or CNN-based text models.

Data

Volume: 420,000 historical English support tickets
Text length: 8-1,200 tokens (median: 96)
Labels: 4 routing classes; mildly imbalanced (technical 41%, billing 24%, account 19%, product 16%)
Format: Subject + body, with HTML fragments, signatures, URLs, stack traces, and occasional copied chat logs

Success Criteria

A strong solution should achieve macro-F1 >= 0.86, improve recall on long tickets (>256 tokens) versus the LSTM baseline, and keep p95 inference latency under 120 ms per ticket in batch scoring.

Constraints

Deployment target is a single T4 GPU for training and CPU inference in production
Model artifact should stay under 500 MB
Explainability is required at a practical level for misrouted tickets

Requirements

Build a multi-class text classification pipeline for ticket routing.
Compare a Transformer approach against an RNN or CNN baseline and explain the architectural differences in practical NLP terms.
Implement realistic preprocessing for noisy support text.
Fine-tune a modern Python model and report evaluation metrics by class and by text length bucket.
Explain why Transformers handle long-range dependencies differently from RNNs and CNNs, and discuss trade-offs in compute, parallelism, and context modeling.

Business Context

Data

Volume: 420,000 historical English support tickets

Text length: 8-1,200 tokens (median: 96)

Labels: 4 routing classes; mildly imbalanced (technical 41%, billing 24%, account 19%, product 16%)

Format: Subject + body, with HTML fragments, signatures, URLs, stack traces, and occasional copied chat logs

Requirements

Build a multi-class text classification pipeline for ticket routing.

Compare a Transformer approach against an RNN or CNN baseline and explain the architectural differences in practical NLP terms.

Implement realistic preprocessing for noisy support text.

Fine-tune a modern Python model and report evaluation metrics by class and by text length bucket.

Explain why Transformers handle long-range dependencies differently from RNNs and CNNs, and discuss trade-offs in compute, parallelism, and context modeling.

Business Context

Data

Volume: 420,000 historical English support tickets

Text length: 8-1,200 tokens (median: 96)

Labels: 4 routing classes; mildly imbalanced (technical 41%, billing 24%, account 19%, product 16%)

Format: Subject + body, with HTML fragments, signatures, URLs, stack traces, and occasional copied chat logs

Requirements

Build a multi-class text classification pipeline for ticket routing.

Compare a Transformer approach against an RNN or CNN baseline and explain the architectural differences in practical NLP terms.

Implement realistic preprocessing for noisy support text.

Fine-tune a modern Python model and report evaluation metrics by class and by text length bucket.

Explain why Transformers handle long-range dependencies differently from RNNs and CNNs, and discuss trade-offs in compute, parallelism, and context modeling.

Business Context

Data

Volume: 420,000 historical English support tickets

Text length: 8-1,200 tokens (median: 96)

Labels: 4 routing classes; mildly imbalanced (technical 41%, billing 24%, account 19%, product 16%)

Format: Subject + body, with HTML fragments, signatures, URLs, stack traces, and occasional copied chat logs

Requirements

Build a multi-class text classification pipeline for ticket routing.

Compare a Transformer approach against an RNN or CNN baseline and explain the architectural differences in practical NLP terms.

Implement realistic preprocessing for noisy support text.

Fine-tune a modern Python model and report evaluation metrics by class and by text length bucket.

Explain why Transformers handle long-range dependencies differently from RNNs and CNNs, and discuss trade-offs in compute, parallelism, and context modeling.

Interview Guides

Business Context

Data

Success Criteria

Constraints

Requirements

Compare Transformer Models for Ticket Routing

Business Context

Data

Success Criteria

Constraints

Requirements

Your Answer

Compare Transformer Models for Ticket Routing

Business Context

Data

Success Criteria

Constraints

Requirements

Compare Transformer Models for Ticket Routing

Business Context

Data

Success Criteria

Constraints

Requirements

Your Answer