Classify Taobao Reviews by Intent

Business Context

Alibaba Group wants to better understand user feedback from Taobao product reviews and buyer-seller chat logs. Build an NLP solution that classifies short Chinese text into actionable categories so operations teams can route issues faster and summarize customer pain points.

Data

You are given 1.5 million labeled text records collected from Taobao over the last 12 months. Each record is a short user-generated message or review, typically 8-120 Chinese characters (median 34), with occasional emojis, SKU codes, seller shorthand, and mixed Chinese-English tokens. Labels are moderately imbalanced across four classes:

Product Quality: 38%
Logistics/Delivery: 24%
Customer Service: 18%
Price/Promotion: 20%

The corpus is primarily Simplified Chinese, with some dialect terms, misspellings, and repeated characters for emphasis.

Success Criteria

A good solution should achieve macro-F1 ≥ 0.84, with recall ≥ 0.88 on the Logistics/Delivery class because delayed shipment complaints must be escalated quickly. The model should support batch scoring of 5 million texts/day and provide interpretable outputs for analysts.

Constraints

Inference should remain efficient enough for large-scale daily processing on Alibaba Cloud GPU or CPU instances.
The approach should handle noisy marketplace text without heavy manual rules.
Prefer a solution that can be retrained weekly as new labeled data arrives.

Requirements

Build a multi-class text classification pipeline for the four intent categories.
Describe your preprocessing for Chinese marketplace text.
Implement a strong baseline and a transformer-based model in Python.
Explain how you would evaluate class imbalance, confusion patterns, and drift over time.
State what model you would deploy first and why.

Business Context

Data

Product Quality: 38%
Logistics/Delivery: 24%
Customer Service: 18%
Price/Promotion: 20%

The corpus is primarily Simplified Chinese, with some dialect terms, misspellings, and repeated characters for emphasis.

Success Criteria

Constraints

Inference should remain efficient enough for large-scale daily processing on Alibaba Cloud GPU or CPU instances.
The approach should handle noisy marketplace text without heavy manual rules.
Prefer a solution that can be retrained weekly as new labeled data arrives.

Requirements

Build a multi-class text classification pipeline for the four intent categories.
Describe your preprocessing for Chinese marketplace text.
Implement a strong baseline and a transformer-based model in Python.
Explain how you would evaluate class imbalance, confusion patterns, and drift over time.
State what model you would deploy first and why.

Business Context

Data

Product Quality: 38%
Logistics/Delivery: 24%
Customer Service: 18%
Price/Promotion: 20%

The corpus is primarily Simplified Chinese, with some dialect terms, misspellings, and repeated characters for emphasis.

Success Criteria

Constraints

Inference should remain efficient enough for large-scale daily processing on Alibaba Cloud GPU or CPU instances.
The approach should handle noisy marketplace text without heavy manual rules.
Prefer a solution that can be retrained weekly as new labeled data arrives.

Requirements

Build a multi-class text classification pipeline for the four intent categories.
Describe your preprocessing for Chinese marketplace text.
Implement a strong baseline and a transformer-based model in Python.
Explain how you would evaluate class imbalance, confusion patterns, and drift over time.
State what model you would deploy first and why.

Business Context

Data

Product Quality: 38%
Logistics/Delivery: 24%
Customer Service: 18%
Price/Promotion: 20%

The corpus is primarily Simplified Chinese, with some dialect terms, misspellings, and repeated characters for emphasis.

Success Criteria

Constraints

Inference should remain efficient enough for large-scale daily processing on Alibaba Cloud GPU or CPU instances.
The approach should handle noisy marketplace text without heavy manual rules.
Prefer a solution that can be retrained weekly as new labeled data arrives.

Requirements

Build a multi-class text classification pipeline for the four intent categories.
Describe your preprocessing for Chinese marketplace text.
Implement a strong baseline and a transformer-based model in Python.
Explain how you would evaluate class imbalance, confusion patterns, and drift over time.
State what model you would deploy first and why.

Interview Guides

Business Context

Data

Success Criteria

Constraints

Requirements

Classify Taobao Reviews by Intent

Business Context

Data

Success Criteria

Constraints

Requirements

Your Answer

Classify Taobao Reviews by Intent

Business Context

Data

Success Criteria

Constraints

Requirements

Classify Taobao Reviews by Intent

Business Context

Data

Success Criteria

Constraints

Requirements

Your Answer