ShopSphere, a large ecommerce marketplace, receives thousands of customer reviews and post-purchase comments every day. The support and product teams want a model that automatically classifies each text into actionable intent categories so issues can be routed faster and trend analysis can be automated.
You have 1.8 million labeled English reviews collected over 18 months. Each record contains free-form text, a product category, and one of four labels: Product Quality, Delivery Issue, Billing/Refund, or General Praise. Text length ranges from 5 to 350 words with a median of 42 words. The class distribution is moderately imbalanced: Product Quality (38%), Delivery Issue (24%), Billing/Refund (14%), General Praise (24%). The data includes misspellings, emojis, repeated punctuation, HTML fragments, and copied order metadata.
A strong solution should achieve macro-F1 >= 0.84 and recall >= 0.90 on the Billing/Refund class, since missed refund complaints create operational risk. The model should support batch scoring and near-real-time inference for new reviews.