Build a Safe Tool-Calling Agent

Scenario

You are building an internal operations copilot for a mid-sized financial services platform. The assistant helps employees answer account questions, draft case notes, and take low-risk actions such as fetching customer records or creating follow-up tasks. It serves a few hundred internal users today, but leadership wants to expand it to all operations staff if it can stay reliable and auditable. The main concern is that the agent may call tools incorrectly, follow malicious instructions from tool outputs, or take actions without enough evidence.

Constraints

p95 latency: 3,000ms for read-only requests; 5,000ms for action-taking requests
Cost ceiling: $0.04 per request on average at 200K requests/month
Hallucinated tool use or unsupported factual claims: <2% on a labeled eval set
Zero tolerance for unauthorized actions, prompt injection success, or cross-customer data leakage
All actions must be logged with inputs, outputs, and approval path

Available Resources

A GPT-4-class or Claude-class model with tool/function calling support
Internal tools for customer lookup, invoice retrieval, CRM note creation, and task creation
Historical support transcripts, tool logs, and 1,000 labeled examples for offline evaluation
A small human-review queue for escalations and approval of risky actions

Question

How would you build this agent so it can call tools safely and reliably in production? Explain the design you would choose, how you would evaluate it before launch, and how you would control hallucination, prompt injection, latency, and cost while still keeping the system useful.

Scenario

Constraints

p95 latency: 3,000ms for read-only requests; 5,000ms for action-taking requests

Cost ceiling: $0.04 per request on average at 200K requests/month

Hallucinated tool use or unsupported factual claims: <2% on a labeled eval set

Zero tolerance for unauthorized actions, prompt injection success, or cross-customer data leakage

All actions must be logged with inputs, outputs, and approval path

Available Resources

A GPT-4-class or Claude-class model with tool/function calling support

Internal tools for customer lookup, invoice retrieval, CRM note creation, and task creation

Historical support transcripts, tool logs, and 1,000 labeled examples for offline evaluation

A small human-review queue for escalations and approval of risky actions

Scenario

Constraints

p95 latency: 3,000ms for read-only requests; 5,000ms for action-taking requests

Cost ceiling: $0.04 per request on average at 200K requests/month

Hallucinated tool use or unsupported factual claims: <2% on a labeled eval set

Zero tolerance for unauthorized actions, prompt injection success, or cross-customer data leakage

All actions must be logged with inputs, outputs, and approval path

Available Resources

A GPT-4-class or Claude-class model with tool/function calling support

Internal tools for customer lookup, invoice retrieval, CRM note creation, and task creation

Historical support transcripts, tool logs, and 1,000 labeled examples for offline evaluation

A small human-review queue for escalations and approval of risky actions

Scenario

Constraints

p95 latency: 3,000ms for read-only requests; 5,000ms for action-taking requests

Cost ceiling: $0.04 per request on average at 200K requests/month

Hallucinated tool use or unsupported factual claims: <2% on a labeled eval set

Zero tolerance for unauthorized actions, prompt injection success, or cross-customer data leakage

All actions must be logged with inputs, outputs, and approval path

Available Resources

A GPT-4-class or Claude-class model with tool/function calling support

Internal tools for customer lookup, invoice retrieval, CRM note creation, and task creation

Historical support transcripts, tool logs, and 1,000 labeled examples for offline evaluation

A small human-review queue for escalations and approval of risky actions

Interview Guides

Scenario

Constraints

Available Resources

Question

Build a Safe Tool-Calling Agent

Scenario

Constraints

Available Resources

Question

Your Answer

Build a Safe Tool-Calling Agent

Scenario

Constraints

Available Resources

Question

Build a Safe Tool-Calling Agent

Scenario

Constraints

Available Resources

Question

Your Answer