You are working on a subscription-based digital product that collects customer feedback from app reviews, support tickets, post-purchase surveys, and free-text NPS responses. The business team wants to turn roughly 300,000 comments collected over the last 18 months into clear insights about recurring pain points, feature requests, and satisfaction drivers. The data is noisy: comments vary from 3-word fragments to multi-sentence complaints, include emojis, spelling errors, duplicated submissions, and a small but important share of Spanish-language text. You have limited manually labeled data for sentiment and issue categories, but stakeholders still expect a practical approach that can surface trends they can act on.
How would you design an NLP workflow to derive reliable insights from this feedback corpus, including how you would preprocess the text, choose models, evaluate quality, and present outputs that business stakeholders can use for prioritization?