Route Enterprise Queries Across LLMs

Business Context

LexiOps, an internal AI platform for a large SaaS company, serves employee requests such as policy Q&A, code help, incident summaries, and contract drafting. The platform currently sends every request to the same large model, creating unnecessary cost and latency; the team wants an orchestration layer that routes each request to the right model and tool chain.

Data

Volume: 3.5M historical prompts and responses, plus ~120K new requests per day
Text length: 5-2,000 tokens per request; median 180 tokens
Language: English (95%), mixed multilingual content in the remainder
Labels: Historical traces include task type, selected model/tool path, latency, cost, user rating, and fallback events
Class distribution: Heavily skewed toward general Q&A and summarization; low-frequency classes include legal drafting and incident analysis

Success Criteria

A good solution should route requests to the correct orchestration path with >= 88% macro-F1 on offline evaluation, reduce average inference cost by >= 30%, keep p95 routing latency under 50 ms, and preserve answer quality within 2 percentage points of the current single-model baseline.

Constraints

Routing must run before generation, so the classifier must be lightweight
Some requests require retrieval, some require function calling, and some must be blocked or escalated
Sensitive prompts cannot be sent to external APIs
The system must support fallback when the first selected model fails or exceeds latency budget

Requirements

Build an NLP routing system that classifies each incoming request into an orchestration path.
Design preprocessing for noisy prompts, code snippets, markdown, and multilingual text.
Implement a modern Python solution using transformers for routing classification.
Include model training, evaluation, and a simple rule-based policy layer for constraints.
Explain how you would connect the classifier to downstream LLM, RAG, and tool-calling workflows.

Business Context

Data

Volume: 3.5M historical prompts and responses, plus ~120K new requests per day
Text length: 5-2,000 tokens per request; median 180 tokens
Language: English (95%), mixed multilingual content in the remainder
Labels: Historical traces include task type, selected model/tool path, latency, cost, user rating, and fallback events
Class distribution: Heavily skewed toward general Q&A and summarization; low-frequency classes include legal drafting and incident analysis

Success Criteria

Constraints

Routing must run before generation, so the classifier must be lightweight
Some requests require retrieval, some require function calling, and some must be blocked or escalated
Sensitive prompts cannot be sent to external APIs
The system must support fallback when the first selected model fails or exceeds latency budget

Requirements

Build an NLP routing system that classifies each incoming request into an orchestration path.
Design preprocessing for noisy prompts, code snippets, markdown, and multilingual text.
Implement a modern Python solution using transformers for routing classification.
Include model training, evaluation, and a simple rule-based policy layer for constraints.
Explain how you would connect the classifier to downstream LLM, RAG, and tool-calling workflows.

Business Context

Data

Volume: 3.5M historical prompts and responses, plus ~120K new requests per day
Text length: 5-2,000 tokens per request; median 180 tokens
Language: English (95%), mixed multilingual content in the remainder
Labels: Historical traces include task type, selected model/tool path, latency, cost, user rating, and fallback events
Class distribution: Heavily skewed toward general Q&A and summarization; low-frequency classes include legal drafting and incident analysis

Success Criteria

Constraints

Routing must run before generation, so the classifier must be lightweight
Some requests require retrieval, some require function calling, and some must be blocked or escalated
Sensitive prompts cannot be sent to external APIs
The system must support fallback when the first selected model fails or exceeds latency budget

Requirements

Build an NLP routing system that classifies each incoming request into an orchestration path.
Design preprocessing for noisy prompts, code snippets, markdown, and multilingual text.
Implement a modern Python solution using transformers for routing classification.
Include model training, evaluation, and a simple rule-based policy layer for constraints.
Explain how you would connect the classifier to downstream LLM, RAG, and tool-calling workflows.

Business Context

Data

Volume: 3.5M historical prompts and responses, plus ~120K new requests per day
Text length: 5-2,000 tokens per request; median 180 tokens
Language: English (95%), mixed multilingual content in the remainder
Labels: Historical traces include task type, selected model/tool path, latency, cost, user rating, and fallback events
Class distribution: Heavily skewed toward general Q&A and summarization; low-frequency classes include legal drafting and incident analysis

Success Criteria

Constraints

Routing must run before generation, so the classifier must be lightweight
Some requests require retrieval, some require function calling, and some must be blocked or escalated
Sensitive prompts cannot be sent to external APIs
The system must support fallback when the first selected model fails or exceeds latency budget

Requirements

Build an NLP routing system that classifies each incoming request into an orchestration path.
Design preprocessing for noisy prompts, code snippets, markdown, and multilingual text.
Implement a modern Python solution using transformers for routing classification.
Include model training, evaluation, and a simple rule-based policy layer for constraints.
Explain how you would connect the classifier to downstream LLM, RAG, and tool-calling workflows.

Interview Guides

Business Context

Data

Success Criteria

Constraints

Requirements

Route Enterprise Queries Across LLMs

Business Context

Data

Success Criteria

Constraints

Requirements

Your Answer

Route Enterprise Queries Across LLMs

Business Context

Data

Success Criteria

Constraints

Requirements

Route Enterprise Queries Across LLMs

Business Context

Data

Success Criteria

Constraints

Requirements

Your Answer