You are building a production-grade agentic assistant for an internal operations and analysis workflow. The system must answer questions, retrieve internal knowledge, call external APIs, and complete multi-step tasks across several data sources with minimal human intervention. It will serve a few hundred daily active users at launch, but leadership expects rapid expansion to multiple teams and higher-stakes workflows. The current prototype works on simple cases but is slow, expensive, and occasionally fabricates unsupported actions or conclusions.
How would you design this agentic platform so it can reliably orchestrate multiple models, tools, and data sources in production while meeting the latency, cost, and safety requirements? Explain the architecture you would choose, how you would ground and evaluate the agent, and how you would handle failure modes such as hallucination, bad tool use, and prompt injection.