Business Context
Northstar Bank wants to deploy a retrieval-augmented generation (RAG) assistant for employees to answer questions over internal policies, audit procedures, and product documentation. The platform team needs a practical design that works in a regulated enterprise environment rather than a demo chatbot.
Data
- Corpus size: ~2.5 million documents across PDFs, Confluence pages, ticket notes, and policy manuals
- Text length: 1 sentence to 80 pages; many documents contain tables, headers, and repeated boilerplate
- Language: English primarily, with ~12% multilingual content
- Freshness: 5-10K document updates per day
- Label availability: Limited supervised data; about 8,000 historical Q&A pairs and 2,000 manually judged retrieval examples
Success Criteria
A good solution should return grounded answers with citation coverage on at least 90% of responses, achieve strong top-k retrieval quality on held-out enterprise queries, and keep p95 end-to-end latency below 2 seconds for interactive use. The system should also reduce hallucinations, enforce document-level access control, and support monitoring for drift and content freshness.
Constraints
- Sensitive data must remain inside the company VPC
- Role-based access control must be preserved during indexing and retrieval
- Budget limits prohibit very large always-on GPU clusters
- Auditability is required for every answer and cited source
Requirements
- Design an enterprise RAG pipeline covering ingestion, chunking, embedding, indexing, retrieval, reranking, and answer generation.
- Explain the primary deployment challenges: data quality, permissions, latency, hallucination risk, evaluation, and document freshness.
- Propose a modern Python implementation using
sentence-transformers, FAISS, and a lightweight reranker.
- Describe preprocessing for noisy enterprise documents, including OCR cleanup and metadata normalization.
- Define how you would evaluate retrieval quality, answer grounding, and operational reliability before production rollout.