You are building an internal operations copilot for a mid-sized financial services platform. The assistant helps employees answer account questions, draft case notes, and take low-risk actions such as fetching customer records or creating follow-up tasks. It serves a few hundred internal users today, but leadership wants to expand it to all operations staff if it can stay reliable and auditable. The main concern is that the agent may call tools incorrectly, follow malicious instructions from tool outputs, or take actions without enough evidence.
How would you build this agent so it can call tools safely and reliably in production? Explain the design you would choose, how you would evaluate it before launch, and how you would control hallucination, prompt injection, latency, and cost while still keeping the system useful.