Business Context
Google Cloud’s technical enablement team receives thousands of internal and customer-facing questions about GenAI, ML, and NLP across support channels, training forums, and solution review queues. They want an NLP system that can automatically route each question to the right topic bucket so specialists can respond faster and reporting can show where confusion is concentrated.
Data
You are given a historical dataset of 420,000 English-language questions collected from Google Cloud training portals, support forms, and internal Q&A threads.
- Task: classify each question into one of 3 labels: GenAI, ML, or NLP
- Text length: 8-220 tokens, median 34 tokens
- Language: English only
- Label distribution: GenAI 28%, ML 37%, NLP 35%
- Noise: duplicated questions, product names (Vertex AI, BigQuery, Gemini), markdown fragments, URLs, and occasional code snippets
Success Criteria
A good solution should achieve:
- Macro-F1 ≥ 0.88 on a held-out test set
- Per-class F1 ≥ 0.85 for NLP and GenAI
- Inference latency < 80 ms per query in batch scoring
- Clear handling of ambiguous questions such as “How is an LLM different from a traditional NLP classifier?”
Constraints
- Must run in a Google Cloud production environment with modest GPU availability
- The model should be explainable enough for QA reviewers to inspect common failure modes
- Weekly retraining is allowed, but heavy manual relabeling is not
Requirements
- Build a 3-class text classification pipeline for GenAI vs ML vs NLP questions.
- Describe preprocessing for product names, code fragments, URLs, and repeated boilerplate.
- Implement a modern Python solution using a transformer baseline and a lightweight comparison baseline.
- Define evaluation metrics, validation strategy, and ambiguity/error analysis.
- Explain how you would distinguish high-level conceptual NLP questions from broader GenAI or ML questions.