Business Context
OpenText wants to automatically route incoming customer support tickets submitted through OpenText Experience Cloud and support portals. Today, tickets are manually triaged into queues such as billing, product defect, access issue, and feature request, which slows response times and creates inconsistent routing.
Data
You are given 850,000 historical support tickets collected over 18 months.
- Task: Predict one of 5 ticket intent labels:
billing, technical_issue, access_management, feature_request, general_inquiry
- Text fields: subject + description + optional product name
- Text length: 8-600 words, median 72 words
- Language: English only for v1
- Label distribution: moderately imbalanced;
technical_issue is ~42%, feature_request is ~9%
- Noise: HTML fragments, email signatures, ticket IDs, stack traces, repeated reply chains, and copied KB links
Success Criteria
A production-ready solution should achieve macro-F1 >= 0.84, recall >= 0.90 for access_management, and support batch or near-real-time inference under 150 ms per ticket on standard GPU-backed infrastructure.
Constraints
- Data must remain inside OpenText-managed infrastructure
- The model should be explainable enough for support operations review
- The pipeline must handle class imbalance and evolving ticket vocabulary across products
Requirements
- Design an end-to-end NLP pipeline for multi-class text classification.
- Define preprocessing for noisy enterprise support text.
- Implement a strong baseline and a transformer-based production candidate in Python.
- Explain your train/validation/test strategy and how you would prevent leakage from ticket threads.
- Describe evaluation metrics, thresholding, and error analysis.
- State how you would monitor drift and retrain the model after deployment.