Context
At StackForge, enterprise customers want to build internal AI agents on top of your API. Each customer should be able to connect private knowledge sources, define allowed tools, and deploy agents for workflows like policy Q&A, ticket triage, and internal operations lookup.
Constraints
- Multi-tenant SaaS backend with strict tenant isolation
- p95 latency: 3,000ms for single-turn answers; 6,000ms for tool-using agent runs
- Cost ceiling: $0.08 per agent run on average at 2M runs/month
- Hallucination ceiling: <2% materially unsupported answers on a tenant-specific golden set
- Prompt injection success rate: <1% on adversarial evals
- Agents must respect customer RBAC and document-level permissions
- Full audit trail required: prompts, retrieved docs, tool calls, outputs, and policy decisions
Available Resources
- Customer data connectors: Confluence, Google Drive, Slack exports, Jira, and internal HTTP APIs
- Approved models: a fast small model, a mid-tier reasoning model, and an embeddings model
- Managed vector store and keyword index
- Tool execution sandbox for HTTP calls and approved function tools
- 200 labeled tasks from 10 design partners for initial evaluation
Task
- Design the backend architecture for a multi-tenant agent platform, including ingestion, retrieval, orchestration, tool execution, memory, and audit logging.
- Specify how you would make the system eval-first: define offline and online evaluation, golden sets, hallucination measurement, and prompt-injection testing before choosing final model flows.
- Propose the agent runtime: when to retrieve, when to call tools, how to terminate, and how to enforce tenant permissions and refusal behavior.
- Describe how you would manage cost and latency together, including routing between models, caching, rate limits, and fallbacks.
- Identify the top failure modes for enterprise internal agents and how you would detect and mitigate them in production.