Interview Guides

Design Multi-Tenant Internal Agent Backend

Hard

Generative AI & LLMs

Context

At StackForge, enterprise customers want to build internal AI agents on top of your API. Each customer should be able to connect private knowledge sources, define allowed tools, and deploy agents for workflows like policy Q&A, ticket triage, and internal operations lookup.

Constraints

Multi-tenant SaaS backend with strict tenant isolation
p95 latency: 3,000ms for single-turn answers; 6,000ms for tool-using agent runs
Cost ceiling: $0.08 per agent run on average at 2M runs/month
Hallucination ceiling: <2% materially unsupported answers on a tenant-specific golden set
Prompt injection success rate: <1% on adversarial evals
Agents must respect customer RBAC and document-level permissions
Full audit trail required: prompts, retrieved docs, tool calls, outputs, and policy decisions

Available Resources

Customer data connectors: Confluence, Google Drive, Slack exports, Jira, and internal HTTP APIs
Approved models: a fast small model, a mid-tier reasoning model, and an embeddings model
Managed vector store and keyword index
Tool execution sandbox for HTTP calls and approved function tools
200 labeled tasks from 10 design partners for initial evaluation

Task

Design the backend architecture for a multi-tenant agent platform, including ingestion, retrieval, orchestration, tool execution, memory, and audit logging.
Specify how you would make the system eval-first: define offline and online evaluation, golden sets, hallucination measurement, and prompt-injection testing before choosing final model flows.
Propose the agent runtime: when to retrieve, when to call tools, how to terminate, and how to enforce tenant permissions and refusal behavior.
Describe how you would manage cost and latency together, including routing between models, caching, rate limits, and fallbacks.
Identify the top failure modes for enterprise internal agents and how you would detect and mitigate them in production.

Design Multi-Tenant Internal Agent Backend

Hard

Generative AI & LLMs

Context

Constraints

Multi-tenant SaaS backend with strict tenant isolation
p95 latency: 3,000ms for single-turn answers; 6,000ms for tool-using agent runs
Cost ceiling: $0.08 per agent run on average at 2M runs/month
Hallucination ceiling: <2% materially unsupported answers on a tenant-specific golden set
Prompt injection success rate: <1% on adversarial evals
Agents must respect customer RBAC and document-level permissions
Full audit trail required: prompts, retrieved docs, tool calls, outputs, and policy decisions

Available Resources

Customer data connectors: Confluence, Google Drive, Slack exports, Jira, and internal HTTP APIs
Approved models: a fast small model, a mid-tier reasoning model, and an embeddings model
Managed vector store and keyword index
Tool execution sandbox for HTTP calls and approved function tools
200 labeled tasks from 10 design partners for initial evaluation

Task

Design the backend architecture for a multi-tenant agent platform, including ingestion, retrieval, orchestration, tool execution, memory, and audit logging.
Specify how you would make the system eval-first: define offline and online evaluation, golden sets, hallucination measurement, and prompt-injection testing before choosing final model flows.
Propose the agent runtime: when to retrieve, when to call tools, how to terminate, and how to enforce tenant permissions and refusal behavior.
Describe how you would manage cost and latency together, including routing between models, caching, rate limits, and fallbacks.
Identify the top failure modes for enterprise internal agents and how you would detect and mitigate them in production.

Your Answer

Design Multi-Tenant Internal Agent Backend

Hard

Generative AI & LLMs

Context

Constraints

Multi-tenant SaaS backend with strict tenant isolation
p95 latency: 3,000ms for single-turn answers; 6,000ms for tool-using agent runs
Cost ceiling: $0.08 per agent run on average at 2M runs/month
Hallucination ceiling: <2% materially unsupported answers on a tenant-specific golden set
Prompt injection success rate: <1% on adversarial evals
Agents must respect customer RBAC and document-level permissions
Full audit trail required: prompts, retrieved docs, tool calls, outputs, and policy decisions

Available Resources

Customer data connectors: Confluence, Google Drive, Slack exports, Jira, and internal HTTP APIs
Approved models: a fast small model, a mid-tier reasoning model, and an embeddings model
Managed vector store and keyword index
Tool execution sandbox for HTTP calls and approved function tools
200 labeled tasks from 10 design partners for initial evaluation

Task

Design the backend architecture for a multi-tenant agent platform, including ingestion, retrieval, orchestration, tool execution, memory, and audit logging.
Specify how you would make the system eval-first: define offline and online evaluation, golden sets, hallucination measurement, and prompt-injection testing before choosing final model flows.
Propose the agent runtime: when to retrieve, when to call tools, how to terminate, and how to enforce tenant permissions and refusal behavior.
Describe how you would manage cost and latency together, including routing between models, caching, rate limits, and fallbacks.
Identify the top failure modes for enterprise internal agents and how you would detect and mitigate them in production.

Design Multi-Tenant Internal Agent Backend

Hard

Generative AI & LLMs

Context

Constraints

Multi-tenant SaaS backend with strict tenant isolation
p95 latency: 3,000ms for single-turn answers; 6,000ms for tool-using agent runs
Cost ceiling: $0.08 per agent run on average at 2M runs/month
Hallucination ceiling: <2% materially unsupported answers on a tenant-specific golden set
Prompt injection success rate: <1% on adversarial evals
Agents must respect customer RBAC and document-level permissions
Full audit trail required: prompts, retrieved docs, tool calls, outputs, and policy decisions

Available Resources

Customer data connectors: Confluence, Google Drive, Slack exports, Jira, and internal HTTP APIs
Approved models: a fast small model, a mid-tier reasoning model, and an embeddings model
Managed vector store and keyword index
Tool execution sandbox for HTTP calls and approved function tools
200 labeled tasks from 10 design partners for initial evaluation

Task

Design the backend architecture for a multi-tenant agent platform, including ingestion, retrieval, orchestration, tool execution, memory, and audit logging.
Specify how you would make the system eval-first: define offline and online evaluation, golden sets, hallucination measurement, and prompt-injection testing before choosing final model flows.
Propose the agent runtime: when to retrieve, when to call tools, how to terminate, and how to enforce tenant permissions and refusal behavior.
Describe how you would manage cost and latency together, including routing between models, caching, rate limits, and fallbacks.
Identify the top failure modes for enterprise internal agents and how you would detect and mitigate them in production.