Design a Tool-Calling Support Assistant

Scenario

You are building an internal AI assistant for a mid-sized SaaS support team. The assistant must answer account and product questions, gather missing details, and call approved backend tools to check order status, retrieve account metadata, and draft structured case notes. It will handle roughly 8,000 conversations per day, and many requests require multi-turn context rather than a single prompt. The current prototype is verbose, occasionally calls the wrong tool, and sometimes returns JSON that downstream systems cannot parse.

Constraints

p95 end-to-end latency: 2,500ms for simple requests, 4,000ms when a tool is called
Cost ceiling: $12K/month at projected volume
Incorrect tool-call rate: <2% on a labeled eval set
Structured output parse success: >99.5%
Must resist prompt injection in user messages and tool results, and must not expose hidden instructions or sensitive fields

Available Resources

One approved frontier model and one cheaper fallback model with tool-calling support
Three internal read-only tools: account lookup, order status lookup, and ticket creation draft
2,000 historical support chats and 300 manually labeled evaluation examples
Capacity for 50 new labels per week from support specialists

Question

How would you design the prompting, context management, structured outputs, and tool/function-calling flow for this assistant so it is reliable under these constraints? Explain how you would evaluate it before launch and what safeguards you would add for hallucinations, prompt injection, and malformed outputs.

Scenario

Constraints

p95 end-to-end latency: 2,500ms for simple requests, 4,000ms when a tool is called
Cost ceiling: $12K/month at projected volume
Incorrect tool-call rate: <2% on a labeled eval set
Structured output parse success: >99.5%
Must resist prompt injection in user messages and tool results, and must not expose hidden instructions or sensitive fields

Available Resources

One approved frontier model and one cheaper fallback model with tool-calling support
Three internal read-only tools: account lookup, order status lookup, and ticket creation draft
2,000 historical support chats and 300 manually labeled evaluation examples
Capacity for 50 new labels per week from support specialists

Question

Scenario

Constraints

p95 end-to-end latency: 2,500ms for simple requests, 4,000ms when a tool is called
Cost ceiling: $12K/month at projected volume
Incorrect tool-call rate: <2% on a labeled eval set
Structured output parse success: >99.5%
Must resist prompt injection in user messages and tool results, and must not expose hidden instructions or sensitive fields

Available Resources

One approved frontier model and one cheaper fallback model with tool-calling support
Three internal read-only tools: account lookup, order status lookup, and ticket creation draft
2,000 historical support chats and 300 manually labeled evaluation examples
Capacity for 50 new labels per week from support specialists

Question

Scenario

Constraints

p95 end-to-end latency: 2,500ms for simple requests, 4,000ms when a tool is called
Cost ceiling: $12K/month at projected volume
Incorrect tool-call rate: <2% on a labeled eval set
Structured output parse success: >99.5%
Must resist prompt injection in user messages and tool results, and must not expose hidden instructions or sensitive fields

Available Resources

One approved frontier model and one cheaper fallback model with tool-calling support
Three internal read-only tools: account lookup, order status lookup, and ticket creation draft
2,000 historical support chats and 300 manually labeled evaluation examples
Capacity for 50 new labels per week from support specialists

Interview Guides

Scenario

Constraints

Available Resources

Question

Design a Tool-Calling Support Assistant

Scenario

Constraints

Available Resources

Question

Your Answer

Design a Tool-Calling Support Assistant

Scenario

Constraints

Available Resources

Question

Design a Tool-Calling Support Assistant

Scenario

Constraints

Available Resources

Question

Your Answer