Classify Google Cloud AI Queries

Business Context

Google Cloud’s technical enablement team receives thousands of internal and customer-facing questions about GenAI, ML, and NLP across support channels, training forums, and solution review queues. They want an NLP system that can automatically route each question to the right topic bucket so specialists can respond faster and reporting can show where confusion is concentrated.

Data

You are given a historical dataset of 420,000 English-language questions collected from Google Cloud training portals, support forms, and internal Q&A threads.

Task: classify each question into one of 3 labels: GenAI, ML, or NLP
Text length: 8-220 tokens, median 34 tokens
Language: English only
Label distribution: GenAI 28%, ML 37%, NLP 35%
Noise: duplicated questions, product names (Vertex AI, BigQuery, Gemini), markdown fragments, URLs, and occasional code snippets

Success Criteria

A good solution should achieve:

Macro-F1 ≥ 0.88 on a held-out test set
Per-class F1 ≥ 0.85 for NLP and GenAI
Inference latency < 80 ms per query in batch scoring
Clear handling of ambiguous questions such as “How is an LLM different from a traditional NLP classifier?”

Constraints

Must run in a Google Cloud production environment with modest GPU availability
The model should be explainable enough for QA reviewers to inspect common failure modes
Weekly retraining is allowed, but heavy manual relabeling is not

Requirements

Build a 3-class text classification pipeline for GenAI vs ML vs NLP questions.
Describe preprocessing for product names, code fragments, URLs, and repeated boilerplate.
Implement a modern Python solution using a transformer baseline and a lightweight comparison baseline.
Define evaluation metrics, validation strategy, and ambiguity/error analysis.
Explain how you would distinguish high-level conceptual NLP questions from broader GenAI or ML questions.

Business Context

Data

You are given a historical dataset of 420,000 English-language questions collected from Google Cloud training portals, support forms, and internal Q&A threads.

Task: classify each question into one of 3 labels: GenAI, ML, or NLP
Text length: 8-220 tokens, median 34 tokens
Language: English only
Label distribution: GenAI 28%, ML 37%, NLP 35%
Noise: duplicated questions, product names (Vertex AI, BigQuery, Gemini), markdown fragments, URLs, and occasional code snippets

Success Criteria

A good solution should achieve:

Macro-F1 ≥ 0.88 on a held-out test set
Per-class F1 ≥ 0.85 for NLP and GenAI
Inference latency < 80 ms per query in batch scoring
Clear handling of ambiguous questions such as “How is an LLM different from a traditional NLP classifier?”

Constraints

Must run in a Google Cloud production environment with modest GPU availability
The model should be explainable enough for QA reviewers to inspect common failure modes
Weekly retraining is allowed, but heavy manual relabeling is not

Requirements

Build a 3-class text classification pipeline for GenAI vs ML vs NLP questions.
Describe preprocessing for product names, code fragments, URLs, and repeated boilerplate.
Implement a modern Python solution using a transformer baseline and a lightweight comparison baseline.
Define evaluation metrics, validation strategy, and ambiguity/error analysis.
Explain how you would distinguish high-level conceptual NLP questions from broader GenAI or ML questions.

Business Context

Data

You are given a historical dataset of 420,000 English-language questions collected from Google Cloud training portals, support forms, and internal Q&A threads.

Task: classify each question into one of 3 labels: GenAI, ML, or NLP
Text length: 8-220 tokens, median 34 tokens
Language: English only
Label distribution: GenAI 28%, ML 37%, NLP 35%
Noise: duplicated questions, product names (Vertex AI, BigQuery, Gemini), markdown fragments, URLs, and occasional code snippets

Success Criteria

A good solution should achieve:

Macro-F1 ≥ 0.88 on a held-out test set
Per-class F1 ≥ 0.85 for NLP and GenAI
Inference latency < 80 ms per query in batch scoring
Clear handling of ambiguous questions such as “How is an LLM different from a traditional NLP classifier?”

Constraints

Must run in a Google Cloud production environment with modest GPU availability
The model should be explainable enough for QA reviewers to inspect common failure modes
Weekly retraining is allowed, but heavy manual relabeling is not

Requirements

Build a 3-class text classification pipeline for GenAI vs ML vs NLP questions.
Describe preprocessing for product names, code fragments, URLs, and repeated boilerplate.
Implement a modern Python solution using a transformer baseline and a lightweight comparison baseline.
Define evaluation metrics, validation strategy, and ambiguity/error analysis.
Explain how you would distinguish high-level conceptual NLP questions from broader GenAI or ML questions.

Business Context

Data

You are given a historical dataset of 420,000 English-language questions collected from Google Cloud training portals, support forms, and internal Q&A threads.

Task: classify each question into one of 3 labels: GenAI, ML, or NLP
Text length: 8-220 tokens, median 34 tokens
Language: English only
Label distribution: GenAI 28%, ML 37%, NLP 35%
Noise: duplicated questions, product names (Vertex AI, BigQuery, Gemini), markdown fragments, URLs, and occasional code snippets

Success Criteria

A good solution should achieve:

Macro-F1 ≥ 0.88 on a held-out test set
Per-class F1 ≥ 0.85 for NLP and GenAI
Inference latency < 80 ms per query in batch scoring
Clear handling of ambiguous questions such as “How is an LLM different from a traditional NLP classifier?”

Constraints

Must run in a Google Cloud production environment with modest GPU availability
The model should be explainable enough for QA reviewers to inspect common failure modes
Weekly retraining is allowed, but heavy manual relabeling is not

Requirements

Build a 3-class text classification pipeline for GenAI vs ML vs NLP questions.
Describe preprocessing for product names, code fragments, URLs, and repeated boilerplate.
Implement a modern Python solution using a transformer baseline and a lightweight comparison baseline.
Define evaluation metrics, validation strategy, and ambiguity/error analysis.
Explain how you would distinguish high-level conceptual NLP questions from broader GenAI or ML questions.

Interview Guides

Business Context

Data

Success Criteria

Constraints

Requirements

Classify Google Cloud AI Queries

Business Context

Data

Success Criteria

Constraints

Requirements

Your Answer

Classify Google Cloud AI Queries

Business Context

Data

Success Criteria

Constraints

Requirements

Classify Google Cloud AI Queries

Business Context

Data

Success Criteria

Constraints

Requirements

Your Answer