You are building a document-grounded assistant for an internal operations team that answers questions over policy manuals, underwriting guidelines, and process documentation. The corpus contains about 200,000 documents with frequent updates, and users expect concise answers with citations rather than long summaries. Keyword search alone misses semantically similar phrasing, so the team is considering vector search as part of a retrieval-augmented generation pipeline. The assistant will be used in a high-trust workflow where unsupported answers are worse than refusals.
How would you design this RAG system, and specifically what role should vector search play relative to keyword retrieval, reranking, and answer generation given the latency, cost, and hallucination constraints?