HelpDeskPro is replacing a legacy TF-IDF ticket router with a Transformer-based classifier for customer support tickets. Before deployment, the NLP team wants candidates to demonstrate practical understanding of self-attention and how it affects implementation, interpretability, and performance.
You are given 850,000 historical English support tickets from SaaS customers. Each ticket contains a subject and body, with lengths ranging from 8 to 900 tokens (median: 96). Labels cover 6 routing queues: Billing, Account Access, Bug Report, Feature Request, Integration, and Other. The class distribution is moderately imbalanced, with Bug Report and Account Access making up 58% of all tickets.
A strong solution should correctly explain self-attention mathematically and operationally, connect it to token interactions in support text, and show how to implement and inspect it in a modern Transformer pipeline. Good enough means the proposed model reaches at least 0.84 macro-F1 on a held-out set while keeping p95 inference latency under 120 ms for sequences up to 256 tokens.