You are building an internal question-answering assistant for an enterprise search tool that helps employees find answers across ~2 million documents, including policy PDFs, engineering runbooks, product manuals, and support articles. The corpus contains long documents with tables, repeated boilerplate, versioned content, and frequent updates, and many user questions require grounding in multiple passages rather than a single snippet. You have click logs and a small set of manually judged query-document pairs, but no large supervised QA dataset. The system must return answers with citations and reduce hallucinations when the retrieved evidence is weak or conflicting.
How would you design and implement a retrieval-augmented generation pipeline for this product, including preprocessing, retrieval, generation, and evaluation, and how would you handle failure modes such as stale content, poor chunking, and unsupported answers?