
You're building a system where a language model needs to answer questions grounded in historical records rather than only its pretraining. The collection includes OCR'd census pages, immigration manifests, city directories, and family tree notes, so retrieval quality and grounded generation both matter.
How would you design a retrieval-augmented generation (RAG) workflow for historical records search?
Hybrid retrieval for noisy historical textVector search and metadata-aware filteringGrounded answer generation with citationsEvaluation of retrieval quality and hallucination risk