You are improving an NLP pipeline for an enterprise support platform that routes ~200,000 customer tickets per week into 12 issue categories such as billing, account access, export failures, and API errors. The current keyword-based system performs poorly because tickets contain noisy text, product jargon, stack traces, screenshots converted by OCR, and short follow-up replies like “still broken after cache clear.” You have 800,000 historical labeled tickets in English, with moderate class imbalance and frequent vocabulary drift after product releases. The team wants a feature-engineered baseline that is interpretable, fast to retrain, and strong enough to benchmark against transformer models.
How would you design the feature engineering pipeline for this text classification problem, including preprocessing, representation choices, and evaluation, and how would you decide which engineered features are worth keeping as the system evolves?