ShopLens, an e-commerce analytics company, wants a baseline text classification pipeline to automatically label customer reviews as positive or negative before routing low-rated feedback to support. The hiring team wants to assess your practical knowledge of Pandas for data preparation, Scikit-Learn for classical ML, and PyTorch or TensorFlow for a simple neural baseline.
You are given a historical review dataset exported from the marketplace data warehouse.
| Feature Group | Count | Examples |
|---|---|---|
| Text | 1 | review_text |
| Numeric metadata | 4 | review_length, helpful_votes, days_since_purchase, rating |
| Categorical metadata | 3 | product_category, country, device_type |
| Target | 1 | sentiment_label |
helpful_votes, 7% missing in device_type, 3% empty review_textA strong solution should achieve F1 >= 0.84 on the negative-review class and ROC-AUC >= 0.90 on the held-out test set. The candidate should also compare at least one classical model with one neural-network approach and explain tradeoffs.