Context
ForgeFlow sells an AI assistant for software teams that can answer questions about internal code/docs, draft pull request summaries, suggest fixes for CI failures, and optionally open tickets or propose code changes. The customer is a regulated enterprise and wants to deploy it across their engineering workflow without creating security, reliability, or compliance incidents.
Constraints
- p95 latency: 3,000ms for read-only tasks; up to 8,000ms for actions that call tools
- Cost ceiling: $35K/month at 40K daily active requests
- Hallucination ceiling: <2% on high-risk tasks (code/config/security guidance), measured on a labeled golden set
- Prompt-injection success rate: <0.5% on adversarial evals
- Any action that changes state must be auditable, permission-scoped, and human-approved by default
- The system must avoid leaking secrets, proprietary code, or cross-team data
Available Resources
- 2M internal artifacts: code files, PRs, runbooks, incident docs, RFCs, CI logs, and issue tracker tickets
- Read-only tools for GitHub, Jira, CI, and internal docs; write tools exist but can be gated behind approval
- Approved models: a fast small model, a stronger general-purpose model, and an embedding model
- Security team can provide 200 adversarial prompt-injection examples and 100 secret-leakage test cases
- 25 staff engineers can label a 600-task golden set
Task
- Design a safe LLM architecture for this product, including which capabilities should be read-only, which can use tools, and where human approval is required.
- Define an eval-first deployment plan: offline evals, online metrics, launch gates, and rollback criteria.
- Specify how you would reduce hallucination, prompt injection, and data leakage risk while staying within the latency and cost limits.
- Write a production-grade system prompt for the assistant that handles grounded answering, tool use, and refusal behavior.
- Estimate cost/latency and identify the top failure modes, detection signals, and mitigations.