You are building an NLP workflow for a digital customer support platform that receives about 80,000 inbound emails each week. Today, agents manually read and tag messages into intents such as billing issue, delivery delay, cancellation request, account access, and product complaint before routing them to downstream queues. You have roughly 300,000 historical emails with noisy human-assigned labels, message lengths ranging from one-line requests to multi-paragraph complaints, and a mix of English plus some code-mixed Hindi-English text. The business wants an automated text classification system that can improve routing consistency and reduce manual triage effort.
What is text classification, and how would you design and implement a practical text classification pipeline for this email-routing use case, including preprocessing, model choice, training, and evaluation?