Business Context
FinSure, a retail investing platform, collects customer comments from app reviews, chat transcripts, post-call surveys, and email feedback. The product team wants an NLP system that classifies each comment as positive, neutral, or negative sentiment so they can monitor customer satisfaction and identify emerging issues with specific financial products.
Data
- Volume: 420,000 historical labeled comments and ~18,000 new comments per day
- Text length: 5-220 words (median: 34 words)
- Language: English only for the first release
- Label distribution: Positive 46%, Neutral 24%, Negative 30%
- Domain characteristics: Financial vocabulary, abbreviations, references to fees, APR, transfers, delays, compliance, and account restrictions
Success Criteria
A production-ready model should achieve macro F1 >= 0.86, with negative-class recall >= 0.90, since missing dissatisfied customers is costly for retention and compliance escalation. Batch inference should process daily volume within the existing analytics pipeline, and single-comment scoring should remain under 120 ms for dashboard use.
Constraints
- Comments may contain PII and account-related details, so preprocessing must redact sensitive fields
- The solution must run in a secure Python environment on a single T4 GPU
- The model should be easy to retrain weekly as new labeled comments arrive
Requirements
- Build a 3-class sentiment classifier for customer comments about financial products
- Design a preprocessing pipeline for noisy, user-generated financial text
- Fine-tune a modern transformer model in Python
- Evaluate performance with class-level metrics and confusion analysis
- Explain how you would handle class imbalance, domain-specific language, and deployment constraints