You are building an internal AI assistant for a mid-sized SaaS support team. The assistant must answer account and product questions, gather missing details, and call approved backend tools to check order status, retrieve account metadata, and draft structured case notes. It will handle roughly 8,000 conversations per day, and many requests require multi-turn context rather than a single prompt. The current prototype is verbose, occasionally calls the wrong tool, and sometimes returns JSON that downstream systems cannot parse.
How would you design the prompting, context management, structured outputs, and tool/function-calling flow for this assistant so it is reliable under these constraints? Explain how you would evaluate it before launch and what safeguards you would add for hallucinations, prompt injection, and malformed outputs.