You are building a research assistant for scientists who ask technical questions about large language models, transformer architectures, and AlphaFold-style biological applications. The assistant must answer from a curated corpus of papers, internal notes, and benchmark summaries rather than free-form model knowledge. Users expect concise, citation-backed answers they can trust during literature review and experiment planning. The initial launch targets a few hundred researchers, but the corpus already contains tens of thousands of documents and will grow quickly.
How would you build this system so it can answer first-round interview-style questions about LLMs, transformers, and AlphaFold in biological settings while meeting the latency, cost, grounding, and safety requirements? Explain the design you would choose and how you would evaluate and operate it in production.