Explain Vector Search in RAG

Scenario

You are building a document-grounded assistant for an internal operations team that answers questions over policy manuals, underwriting guidelines, and process documentation. The corpus contains about 200,000 documents with frequent updates, and users expect concise answers with citations rather than long summaries. Keyword search alone misses semantically similar phrasing, so the team is considering vector search as part of a retrieval-augmented generation pipeline. The assistant will be used in a high-trust workflow where unsupported answers are worse than refusals.

Constraints

p95 latency must stay under 2,000ms end to end
Cost ceiling is $12,000/month at 20,000 queries per day
Hallucinated or unsupported claims must stay below 2% on a labeled evaluation set
The system must resist prompt injection in retrieved documents and refuse when evidence is insufficient

Available Resources

200,000 internal documents with titles, timestamps, and access-control metadata
An approved LLM API, embedding model, and a managed vector database
Existing keyword search infrastructure for BM25 retrieval
Capacity to label a 300-question golden set and run weekly offline evals

Question

How would you design this RAG system, and specifically what role should vector search play relative to keyword retrieval, reranking, and answer generation given the latency, cost, and hallucination constraints?

Scenario

Constraints

p95 latency must stay under 2,000ms end to end
Cost ceiling is $12,000/month at 20,000 queries per day
Hallucinated or unsupported claims must stay below 2% on a labeled evaluation set
The system must resist prompt injection in retrieved documents and refuse when evidence is insufficient

Available Resources

200,000 internal documents with titles, timestamps, and access-control metadata
An approved LLM API, embedding model, and a managed vector database
Existing keyword search infrastructure for BM25 retrieval
Capacity to label a 300-question golden set and run weekly offline evals

Question

Scenario

Constraints

p95 latency must stay under 2,000ms end to end
Cost ceiling is $12,000/month at 20,000 queries per day
Hallucinated or unsupported claims must stay below 2% on a labeled evaluation set
The system must resist prompt injection in retrieved documents and refuse when evidence is insufficient

Available Resources

200,000 internal documents with titles, timestamps, and access-control metadata
An approved LLM API, embedding model, and a managed vector database
Existing keyword search infrastructure for BM25 retrieval
Capacity to label a 300-question golden set and run weekly offline evals

Question

Scenario

Constraints

p95 latency must stay under 2,000ms end to end
Cost ceiling is $12,000/month at 20,000 queries per day
Hallucinated or unsupported claims must stay below 2% on a labeled evaluation set
The system must resist prompt injection in retrieved documents and refuse when evidence is insufficient

Available Resources

200,000 internal documents with titles, timestamps, and access-control metadata
An approved LLM API, embedding model, and a managed vector database
Existing keyword search infrastructure for BM25 retrieval
Capacity to label a 300-question golden set and run weekly offline evals

Interview Guides

Scenario

Constraints

Available Resources

Question

Explain Vector Search in RAG

Scenario

Constraints

Available Resources

Question

Your Answer

Explain Vector Search in RAG

Scenario

Constraints

Available Resources

Question

Explain Vector Search in RAG

Scenario

Constraints

Available Resources

Question

Your Answer