Design a Multi-Tool AI Agent

Scenario

You are building a production-grade agentic assistant for an internal operations and analysis workflow. The system must answer questions, retrieve internal knowledge, call external APIs, and complete multi-step tasks across several data sources with minimal human intervention. It will serve a few hundred daily active users at launch, but leadership expects rapid expansion to multiple teams and higher-stakes workflows. The current prototype works on simple cases but is slow, expensive, and occasionally fabricates unsupported actions or conclusions.

Constraints

p95 end-to-end latency: 4,000ms for standard requests
Cost ceiling: $30K/month at 100K requests/month
Unsupported factual claims or invalid tool actions: <2% on a labeled eval set
Must resist prompt injection from retrieved content and tool outputs
Must respect per-user data access boundaries across connected systems

Available Resources

Access to frontier and mid-tier LLM APIs with tool-calling support
Internal documents, structured records, and event logs with metadata and ACLs
External REST APIs for ticketing, search, and operational actions
A small labeling budget for golden-set creation and weekly regression testing

Question

How would you design this agentic platform so it can reliably orchestrate multiple models, tools, and data sources in production while meeting the latency, cost, and safety requirements? Explain the architecture you would choose, how you would ground and evaluate the agent, and how you would handle failure modes such as hallucination, bad tool use, and prompt injection.

Scenario

Constraints

p95 end-to-end latency: 4,000ms for standard requests

Cost ceiling: $30K/month at 100K requests/month

Unsupported factual claims or invalid tool actions: <2% on a labeled eval set

Must resist prompt injection from retrieved content and tool outputs

Must respect per-user data access boundaries across connected systems

Question

Scenario

Constraints

p95 end-to-end latency: 4,000ms for standard requests

Cost ceiling: $30K/month at 100K requests/month

Unsupported factual claims or invalid tool actions: <2% on a labeled eval set

Must resist prompt injection from retrieved content and tool outputs

Must respect per-user data access boundaries across connected systems

Question

Scenario

Constraints

p95 end-to-end latency: 4,000ms for standard requests

Cost ceiling: $30K/month at 100K requests/month

Unsupported factual claims or invalid tool actions: <2% on a labeled eval set

Must resist prompt injection from retrieved content and tool outputs

Must respect per-user data access boundaries across connected systems

Question

Interview Guides

Scenario

Constraints

Available Resources

Question

Design a Multi-Tool AI Agent

Scenario

Constraints

Available Resources

Question

Your Answer

Design a Multi-Tool AI Agent

Scenario

Constraints

Available Resources

Question

Design a Multi-Tool AI Agent

Scenario

Constraints

Available Resources

Question

Your Answer