Business Context
NovaSearch is upgrading its enterprise document assistant and needs a lightweight internal review system that classifies proposed model architectures for different NLP workloads. The platform team wants candidates to explain when Mixture of Experts (MoE), FlashAttention, or state-space models (SSMs) are the right choice for long-context and high-throughput language applications.
Data
You are given 180,000 internal architecture notes, benchmark summaries, and design-review comments labeled into 3 classes: MoE, FlashAttention, and StateSpaceModel.
- Volume: 180K labeled documents, plus 25K unlabeled notes for future semi-supervised use
- Text length: 40-1,200 tokens (median 220)
- Language: English technical writing with code snippets, latency tables, and GPU memory references
- Label distribution: MoE 34%, FlashAttention 38%, StateSpaceModel 28%
Success Criteria
A production-ready classifier should achieve macro-F1 >= 0.88, per-class recall >= 0.84, and support batch inference for 10K documents/hour. The solution should preserve technical terminology such as KV-cache, routing, selective state updates, and HBM bandwidth.
Constraints
- Single A10G GPU for fine-tuning
- Inference latency under 120 ms per document at p95
- Explanations must remain interpretable for model review committees
- No external API calls; all processing must run in a secure Python environment
Requirements
- Build a multi-class text classification system that predicts whether a note primarily discusses MoE, FlashAttention, or SSMs.
- Define a preprocessing pipeline for technical NLP text, including handling of code-like tokens and benchmark tables.
- Fine-tune a modern transformer baseline in Python and evaluate it against a simpler TF-IDF baseline.
- Explain, at a high level, the conceptual differences among MoE, FlashAttention, and SSMs, and how those differences appear in the text.
- Propose an error analysis plan for confusing cases such as long-context efficiency notes that mention more than one architecture.