Select Embeddings for RAG Search

Business Context

NovaDesk, a SaaS support platform, is building a GenAI assistant that answers customer questions from product documentation, release notes, and historical support articles. The team wants you to justify which embeddings to use and implement an evaluation pipeline for retrieval quality.

Data

Corpus size: 180,000 documents split into ~1.4M chunks
Text length: 30-1,200 words per document; chunk size target 200-400 tokens
Language: English only
Query volume: ~25,000 user questions per day
Labels available: 12,000 query-document relevance pairs from support logs
Class balance: Highly sparse relevance; most retrieved chunks are non-relevant

Success Criteria

A good solution should improve semantic retrieval over a TF-IDF baseline, achieve strong ranking quality on held-out relevance pairs, and support production inference with p95 embedding latency under 80 ms per query.

Constraints

Must run in a VPC; no external API calls at inference time
Index must fit on a single CPU search node plus one GPU box for offline embedding jobs
Weekly re-indexing is acceptable; full retraining is not

Requirements

Build a retrieval pipeline for a GenAI assistant using dense embeddings.
Compare at least two embedding approaches, including one sentence-transformer model.
Define a realistic chunking and preprocessing strategy for technical documentation.
Implement indexing, retrieval, and offline evaluation in modern Python.
Explain when you would choose domain-specific embeddings, general-purpose embeddings, or TF-IDF hybrids.
Report retrieval metrics and recommend one production-ready embedding approach.

Business Context

Data

Corpus size: 180,000 documents split into ~1.4M chunks
Text length: 30-1,200 words per document; chunk size target 200-400 tokens
Language: English only
Query volume: ~25,000 user questions per day
Labels available: 12,000 query-document relevance pairs from support logs
Class balance: Highly sparse relevance; most retrieved chunks are non-relevant

Success Criteria

Constraints

Must run in a VPC; no external API calls at inference time
Index must fit on a single CPU search node plus one GPU box for offline embedding jobs
Weekly re-indexing is acceptable; full retraining is not

Requirements

Build a retrieval pipeline for a GenAI assistant using dense embeddings.
Compare at least two embedding approaches, including one sentence-transformer model.
Define a realistic chunking and preprocessing strategy for technical documentation.
Implement indexing, retrieval, and offline evaluation in modern Python.
Explain when you would choose domain-specific embeddings, general-purpose embeddings, or TF-IDF hybrids.
Report retrieval metrics and recommend one production-ready embedding approach.

Business Context

Data

Corpus size: 180,000 documents split into ~1.4M chunks
Text length: 30-1,200 words per document; chunk size target 200-400 tokens
Language: English only
Query volume: ~25,000 user questions per day
Labels available: 12,000 query-document relevance pairs from support logs
Class balance: Highly sparse relevance; most retrieved chunks are non-relevant

Success Criteria

Constraints

Must run in a VPC; no external API calls at inference time
Index must fit on a single CPU search node plus one GPU box for offline embedding jobs
Weekly re-indexing is acceptable; full retraining is not

Requirements

Build a retrieval pipeline for a GenAI assistant using dense embeddings.
Compare at least two embedding approaches, including one sentence-transformer model.
Define a realistic chunking and preprocessing strategy for technical documentation.
Implement indexing, retrieval, and offline evaluation in modern Python.
Explain when you would choose domain-specific embeddings, general-purpose embeddings, or TF-IDF hybrids.
Report retrieval metrics and recommend one production-ready embedding approach.

Business Context

Data

Corpus size: 180,000 documents split into ~1.4M chunks
Text length: 30-1,200 words per document; chunk size target 200-400 tokens
Language: English only
Query volume: ~25,000 user questions per day
Labels available: 12,000 query-document relevance pairs from support logs
Class balance: Highly sparse relevance; most retrieved chunks are non-relevant

Success Criteria

Constraints

Must run in a VPC; no external API calls at inference time
Index must fit on a single CPU search node plus one GPU box for offline embedding jobs
Weekly re-indexing is acceptable; full retraining is not

Requirements

Build a retrieval pipeline for a GenAI assistant using dense embeddings.
Compare at least two embedding approaches, including one sentence-transformer model.
Define a realistic chunking and preprocessing strategy for technical documentation.
Implement indexing, retrieval, and offline evaluation in modern Python.
Explain when you would choose domain-specific embeddings, general-purpose embeddings, or TF-IDF hybrids.
Report retrieval metrics and recommend one production-ready embedding approach.

Interview Guides

Business Context

Data

Success Criteria

Constraints

Requirements

Select Embeddings for RAG Search

Business Context

Data

Success Criteria

Constraints

Requirements

Your Answer

Select Embeddings for RAG Search

Business Context

Data

Success Criteria

Constraints

Requirements

Select Embeddings for RAG Search

Business Context

Data

Success Criteria

Constraints

Requirements

Your Answer