Business Context
LexiDesk, a SaaS customer support platform, wants junior ML engineers to explain when to use a language model versus a text classifier in production. The goal is to compare the two approaches on the same support-ticket corpus and show how their objectives, outputs, and evaluation differ.
Data
You are given 180,000 English support tickets collected over 12 months from an e-commerce help center.
- Text length: 8-220 words per ticket, median 42 words
- Labels for classification:
billing, shipping, account, technical, returns
- Label distribution: moderately imbalanced;
shipping and billing together make up ~58%
- Language: English only after filtering non-English tickets
- Additional unlabeled text: 2.4M historical ticket messages for language-model pretraining or prompting experiments
Success Criteria
A strong solution should clearly demonstrate that:
- a language model predicts or generates text based on token context,
- a text classifier assigns one of a fixed set of labels,
- the classifier achieves macro-F1 >= 0.84 on held-out tickets,
- the language model is evaluated with an appropriate generative metric such as perplexity.
Constraints
- Inference for classification must be <50 ms per ticket in batch serving
- Solution must run on a single T4 GPU or CPU fallback
- Use modern Python tooling and a realistic preprocessing pipeline
Requirements
- Build a multi-class text classifier for ticket routing.
- Build or fine-tune a small language model that models ticket text.
- Explain the difference in objective functions, outputs, and downstream use cases.
- Compare preprocessing, training, and evaluation for both systems.
- Provide Python code for preprocessing, training, and evaluation.