You are building an NLP system for an online grocery and membership platform that receives customer feedback from app reviews, post-order surveys, chat transcripts, and support emails. The business wants each message tagged into topics such as delivery experience, product quality, substitutions, pricing, membership, website or app issues, and customer service so teams can prioritize fixes and track trends. You have roughly 300,000 historical messages, but only 40,000 are manually labeled, labels are somewhat noisy, and many messages mention multiple issues in the same note. Feedback ranges from short fragments like “late again” to multi-sentence complaints with order details, promo codes, and references to specific pantry or produce items.
How would you design and implement a topic classification system for this feedback, including your preprocessing pipeline, modeling approach, and evaluation strategy, and how would you handle ambiguous or multi-topic messages in production?