You are building a research assistant for biologists who submit protein sequences and free-text questions such as likely function, domain hints, mutation impact hypotheses, and relevant literature. The assistant must combine sequence-derived evidence with retrieved biological references and return a concise, structured answer that helps researchers prioritize follow-up experiments rather than replace wet-lab validation. You expect a few thousand analyses per day, but some users will batch hundreds of sequences and expect interactive turnaround for single-sequence queries.
How would you build this system so that it produces grounded, useful protein analysis under these constraints, and how would you evaluate and operate it to control hallucination, prompt injection risk, latency, and cost as usage grows?