ZendeskPro wants to automate routing of inbound enterprise support tickets. The team is deciding whether to deploy a prompt-based large language model using in-context learning or fine-tune a smaller transformer for stable production classification.
You have 420,000 historical English support tickets labeled into 6 routing queues: Billing, Technical Bug, Account Access, Feature Request, Compliance, and Sales. Ticket bodies range from 10 to 900 words (median 120), often include pasted logs, order IDs, URLs, and product names. Class distribution is moderately imbalanced: Technical Bug 34%, Billing 22%, Account Access 18%, Feature Request 14%, Compliance 7%, Sales 5%. About 3% of labels are noisy due to manual reassignment after first triage.
A production-ready recommendation should achieve at least 0.88 macro-F1, keep p95 inference latency below 150 ms per ticket, and produce consistent predictions across repeated runs. The solution should also explain when in-context learning is still preferable despite lower consistency.