Context
OrbitOps wants an internal AI assistant that helps engineers handle routine workflow tasks: triaging CI failures, summarizing incident context, drafting Jira tickets, and answering questions about runbooks and service ownership. The goal is to reduce interrupt load on senior engineers without creating unsafe or misleading automation.
Constraints
- p95 latency: 3,000ms for a single-turn request
- Cost ceiling: $12K/month at 200K requests/month
- Hallucination ceiling: <2% on high-risk actions (ownership, runbook steps, incident status)
- Automation policy: the assistant may draft or recommend actions, but cannot execute production changes
- Safety: must resist prompt injection from tickets, logs, or docs; must not reveal secrets or hidden system instructions
Available Resources
- 120K internal documents: runbooks, postmortems, service catalog entries, RFCs, and on-call guides
- Tool APIs: Jira (create draft ticket), PagerDuty (read incidents), CI provider (read build status), service catalog lookup, and internal search
- Approved models: a fast low-cost model for routing and a stronger model for final responses
- 500 historical workflow examples with human-written resolutions
Task
- Design an LLM-powered workflow assistant that decides when to answer directly, when to retrieve documentation, and when to call read-only tools before producing a response.
- Define an evaluation plan first: offline golden sets, adversarial prompt-injection tests, hallucination measurement, and online success metrics after launch.
- Write a system prompt that enforces grounded behavior, tool-use boundaries, refusal behavior, and structured outputs for downstream systems.
- Propose the architecture, including retrieval, agent orchestration, fallback behavior, and how you would stay within the latency and cost budget.
- Identify the major failure modes and mitigations, especially around hallucinated remediation steps, injected instructions in logs, and stale documentation.