Business Context
ShopSphere, an e-commerce marketplace, wants to improve product-review analytics. The search and trust teams need a practical comparison between sparse TF-IDF features and dense word embeddings to understand which representation is better for downstream sentiment classification.
Data
You are given 850,000 English product reviews collected over 18 months.
- Task: Predict review sentiment:
negative, neutral, or positive
- Text length: 5-400 tokens (median: 48)
- Language: English only
- Label distribution: positive 68%, neutral 14%, negative 18%
- Noise: HTML fragments, emojis, repeated punctuation, misspellings, SKU codes, and boilerplate shipping text
The team wants both a conceptual explanation and an implementation that compares classical and neural text representations under realistic constraints.
Success Criteria
- Achieve macro-F1 >= 0.82 on a held-out test set
- Show a clear comparison of accuracy, training speed, inference latency, and interpretability between TF-IDF and embeddings
- Explain when each representation is preferable in production
Constraints
- Training must run on a single 16GB GPU or standard CPU fallback
- Online inference latency should remain under 120ms per review
- The solution must be reproducible and easy to retrain weekly
Requirements
- Explain what word embeddings are and how they differ from TF-IDF in representation, semantics, dimensionality, and sparsity.
- Build a TF-IDF + Logistic Regression baseline.
- Build an embedding-based classifier using a modern transformer in Python.
- Implement realistic preprocessing, tokenization, training, and evaluation.
- Compare both approaches and recommend one for ShopSphere's review pipeline.