Business Context
BrightWave, a subscription software company, collects post-interaction CSAT scores and optional free-text comments after support tickets are resolved. The customer insights team wants an NLP pipeline that explains why satisfaction scores move by extracting sentiment, recurring themes, and actionable complaint categories from open-ended responses.
Data
- Volume: 420,000 historical survey responses collected over 18 months
- Text length: 5-180 words per response, median 24 words
- Language: English only for the first version
- Structured fields: CSAT score (1-5), support channel, product area, account segment, timestamp
- Label availability: 18,000 responses manually tagged into issue categories; the rest are unlabeled
- Distribution: CSAT is skewed positive; comments are sparse and noisy, with many short fragments such as "slow response" or "great rep, bad bug fix"
Success Criteria
A good solution should identify the main drivers of low and high satisfaction, produce stable topic/category summaries by product area, and achieve macro-F1 ≥ 0.78 on manually labeled issue categories. Sentiment signals should correlate meaningfully with CSAT and support weekly reporting.
Constraints
- Inference should process 100,000 comments in under 30 minutes as a batch job
- Outputs must be explainable to non-technical stakeholders
- Personally identifiable information should be removed before modeling
- The team prefers a lightweight Python stack that can run on a single CPU machine, with an optional transformer baseline for comparison
Requirements
- Build a pipeline to clean and normalize survey comments.
- Predict sentiment and/or issue categories from text.
- Surface recurring themes from unlabeled responses.
- Quantify how themes and sentiment relate to CSAT scores.
- Describe preprocessing, model choice, evaluation, and reporting outputs.
- Provide production-ready Python code for training and analysis.