Business Context
Abercrombie & Fitch wants to monitor sentiment in customer feedback from post-purchase surveys, app reviews, contact-center notes, and product reviews across the Abercrombie app and Abercrombie.com. Build an NLP system that classifies feedback sentiment so CX and merchandising teams can quickly identify recurring pain points and positive drivers.
Data
You have 850,000 historical feedback records collected over 18 months.
- Task: classify each feedback item as positive, neutral, or negative
- Volume: ~8,000 new records per day
- Text length: 5-400 tokens, median 42 tokens
- Language: 94% English, 6% mixed English/Spanish
- Label distribution: positive 58%, neutral 19%, negative 23%
- Noise: emojis, misspellings, order IDs, SKU codes, repeated punctuation, and channel-specific shorthand
Success Criteria
A production-ready solution should achieve:
- Macro-F1 >= 0.84 on a held-out test set
- Negative-class recall >= 0.88 to avoid missing poor customer experiences
- Batch inference for daily scoring within 30 minutes
- Clear error analysis for sarcasm, mixed sentiment, and short texts
Constraints
- Use Python with a modern NLP stack
- Model must be deployable on a single GPU for training and CPU for batch inference
- Predictions should be explainable enough for business review
- Do not rely on external customer data beyond the provided corpus
Requirements
- Design a preprocessing pipeline for noisy retail feedback text.
- Build and compare a strong baseline (e.g., TF-IDF + linear classifier) and a transformer-based model.
- Explain how you would fine-tune the model, handle class imbalance, and choose decision thresholds.
- Provide an evaluation plan with appropriate metrics, validation strategy, and error analysis.
- Describe how you would surface sentiment trends by channel, product category, and time period for Abercrombie stakeholders.