Route LLM Architectures for Chatbot Upgrade

Business Context

NovaSearch is upgrading its enterprise document assistant and needs a lightweight internal review system that classifies proposed model architectures for different NLP workloads. The platform team wants candidates to explain when Mixture of Experts (MoE), FlashAttention, or state-space models (SSMs) are the right choice for long-context and high-throughput language applications.

Data

You are given 180,000 internal architecture notes, benchmark summaries, and design-review comments labeled into 3 classes: MoE, FlashAttention, and StateSpaceModel.

Volume: 180K labeled documents, plus 25K unlabeled notes for future semi-supervised use
Text length: 40-1,200 tokens (median 220)
Language: English technical writing with code snippets, latency tables, and GPU memory references
Label distribution: MoE 34%, FlashAttention 38%, StateSpaceModel 28%

Success Criteria

A production-ready classifier should achieve macro-F1 >= 0.88, per-class recall >= 0.84, and support batch inference for 10K documents/hour. The solution should preserve technical terminology such as KV-cache, routing, selective state updates, and HBM bandwidth.

Constraints

Single A10G GPU for fine-tuning
Inference latency under 120 ms per document at p95
Explanations must remain interpretable for model review committees
No external API calls; all processing must run in a secure Python environment

Requirements

Build a multi-class text classification system that predicts whether a note primarily discusses MoE, FlashAttention, or SSMs.
Define a preprocessing pipeline for technical NLP text, including handling of code-like tokens and benchmark tables.
Fine-tune a modern transformer baseline in Python and evaluate it against a simpler TF-IDF baseline.
Explain, at a high level, the conceptual differences among MoE, FlashAttention, and SSMs, and how those differences appear in the text.
Propose an error analysis plan for confusing cases such as long-context efficiency notes that mention more than one architecture.

Business Context

Data

You are given 180,000 internal architecture notes, benchmark summaries, and design-review comments labeled into 3 classes: MoE, FlashAttention, and StateSpaceModel.

Volume: 180K labeled documents, plus 25K unlabeled notes for future semi-supervised use
Text length: 40-1,200 tokens (median 220)
Language: English technical writing with code snippets, latency tables, and GPU memory references
Label distribution: MoE 34%, FlashAttention 38%, StateSpaceModel 28%

Success Criteria

Constraints

Single A10G GPU for fine-tuning
Inference latency under 120 ms per document at p95
Explanations must remain interpretable for model review committees
No external API calls; all processing must run in a secure Python environment

Requirements

Build a multi-class text classification system that predicts whether a note primarily discusses MoE, FlashAttention, or SSMs.
Define a preprocessing pipeline for technical NLP text, including handling of code-like tokens and benchmark tables.
Fine-tune a modern transformer baseline in Python and evaluate it against a simpler TF-IDF baseline.
Explain, at a high level, the conceptual differences among MoE, FlashAttention, and SSMs, and how those differences appear in the text.
Propose an error analysis plan for confusing cases such as long-context efficiency notes that mention more than one architecture.

Business Context

Data

You are given 180,000 internal architecture notes, benchmark summaries, and design-review comments labeled into 3 classes: MoE, FlashAttention, and StateSpaceModel.

Volume: 180K labeled documents, plus 25K unlabeled notes for future semi-supervised use
Text length: 40-1,200 tokens (median 220)
Language: English technical writing with code snippets, latency tables, and GPU memory references
Label distribution: MoE 34%, FlashAttention 38%, StateSpaceModel 28%

Success Criteria

Constraints

Single A10G GPU for fine-tuning
Inference latency under 120 ms per document at p95
Explanations must remain interpretable for model review committees
No external API calls; all processing must run in a secure Python environment

Requirements

Build a multi-class text classification system that predicts whether a note primarily discusses MoE, FlashAttention, or SSMs.
Define a preprocessing pipeline for technical NLP text, including handling of code-like tokens and benchmark tables.
Fine-tune a modern transformer baseline in Python and evaluate it against a simpler TF-IDF baseline.
Explain, at a high level, the conceptual differences among MoE, FlashAttention, and SSMs, and how those differences appear in the text.
Propose an error analysis plan for confusing cases such as long-context efficiency notes that mention more than one architecture.

Business Context

Data

You are given 180,000 internal architecture notes, benchmark summaries, and design-review comments labeled into 3 classes: MoE, FlashAttention, and StateSpaceModel.

Volume: 180K labeled documents, plus 25K unlabeled notes for future semi-supervised use
Text length: 40-1,200 tokens (median 220)
Language: English technical writing with code snippets, latency tables, and GPU memory references
Label distribution: MoE 34%, FlashAttention 38%, StateSpaceModel 28%

Success Criteria

Constraints

Single A10G GPU for fine-tuning
Inference latency under 120 ms per document at p95
Explanations must remain interpretable for model review committees
No external API calls; all processing must run in a secure Python environment

Requirements

Build a multi-class text classification system that predicts whether a note primarily discusses MoE, FlashAttention, or SSMs.
Define a preprocessing pipeline for technical NLP text, including handling of code-like tokens and benchmark tables.
Fine-tune a modern transformer baseline in Python and evaluate it against a simpler TF-IDF baseline.
Explain, at a high level, the conceptual differences among MoE, FlashAttention, and SSMs, and how those differences appear in the text.
Propose an error analysis plan for confusing cases such as long-context efficiency notes that mention more than one architecture.

Interview Guides

Business Context

Data

Success Criteria

Constraints

Requirements

Route LLM Architectures for Chatbot Upgrade

Business Context

Data

Success Criteria

Constraints

Requirements

Your Answer

Route LLM Architectures for Chatbot Upgrade

Business Context

Data

Success Criteria

Constraints

Requirements

Route LLM Architectures for Chatbot Upgrade

Business Context

Data

Success Criteria

Constraints

Requirements

Your Answer