Business Context
OpenText receives customer feedback from support surveys, renewal comments, and product experience forms across products such as OpenText Content Cloud and OpenText Experience Cloud. Design an NLP pipeline that classifies each feedback item as positive, neutral, or negative so product and support teams can track customer sentiment and escalate recurring issues.
Data
- Volume: ~1.8M historical feedback records, with ~25K new comments per day
- Text length: 5-400 words, median 42 words
- Language: English only for the first version, with occasional mixed casing, URLs, ticket IDs, and product names
- Labels: Positive (46%), Neutral (28%), Negative (26%) from analyst-reviewed survey outcomes
- Noise: Duplicate submissions, boilerplate signatures, and templated support text are present
Success Criteria
A strong solution should achieve macro-F1 ≥ 0.84, negative-class recall ≥ 0.90, and batch-score daily feedback within the reporting SLA. The pipeline should also produce calibrated probabilities that can be used for alerting and dashboard thresholds in OpenText Analytics.
Constraints
- Inference should support near-real-time scoring for dashboard refreshes
- The solution must run in OpenText-managed infrastructure without sending raw customer text to external APIs
- The model should be explainable enough for product managers to review common drivers of negative sentiment
Requirements
- Design an end-to-end sentiment analysis pipeline from ingestion to scored output.
- Define preprocessing steps for noisy enterprise feedback text.
- Implement a modern Python solution using a transformer baseline and a simple benchmark model.
- Explain how you would handle class imbalance, duplicate feedback, and domain-specific vocabulary.
- Specify evaluation metrics, validation strategy, and error analysis.
- Describe how scored sentiment would be exposed to downstream OpenText reporting or monitoring systems.