Design a Multi-Agent Research Assistant

Scenario

You are building an internal research assistant that answers complex analyst questions by coordinating multiple agents: one plans the task, one retrieves internal documents, one queries approved external sources, and one synthesizes a final answer. Users ask multi-step questions that often require comparing policies, summarizing recent changes, and citing evidence. The system is expected to support roughly 8,000 queries per day, with noticeable spikes during incident reviews and quarterly planning.

Constraints

p95 latency: 4,000ms for standard queries
Cost ceiling: $12K/month at projected volume
Unsupported or weakly grounded claims must stay below 4% on a 300-question golden set
Must resist prompt injection from retrieved content and external web pages
Final answers must include source-backed citations and refuse when evidence is insufficient

Available Resources

Internal document corpus of ~200K markdown, PDF, and wiki pages
Approved LLM APIs, embedding models, and tool-calling support
Hybrid search over internal content and a small allowlisted external search API
20 hours of SME labeling time per month for evals and error analysis

Question

How would you design the agentic workflow and multi-agent orchestration for this system so it remains grounded, safe, and cost-effective under these constraints? Explain how you would decide when to use multiple agents versus a simpler flow, and how you would evaluate whether the orchestration is actually helping.

Scenario

Constraints

p95 latency: 4,000ms for standard queries

Cost ceiling: $12K/month at projected volume

Unsupported or weakly grounded claims must stay below 4% on a 300-question golden set

Must resist prompt injection from retrieved content and external web pages

Final answers must include source-backed citations and refuse when evidence is insufficient

Question

Scenario

Constraints

p95 latency: 4,000ms for standard queries

Cost ceiling: $12K/month at projected volume

Unsupported or weakly grounded claims must stay below 4% on a 300-question golden set

Must resist prompt injection from retrieved content and external web pages

Final answers must include source-backed citations and refuse when evidence is insufficient

Question

Scenario

Constraints

p95 latency: 4,000ms for standard queries

Cost ceiling: $12K/month at projected volume

Unsupported or weakly grounded claims must stay below 4% on a 300-question golden set

Must resist prompt injection from retrieved content and external web pages

Final answers must include source-backed citations and refuse when evidence is insufficient

Question

Interview Guides

Scenario

Constraints

Available Resources

Question

Design a Multi-Agent Research Assistant

Scenario

Constraints

Available Resources

Question

Your Answer

Design a Multi-Agent Research Assistant

Scenario

Constraints

Available Resources

Question

Design a Multi-Agent Research Assistant

Scenario

Constraints

Available Resources

Question

Your Answer