Context
BrightOps runs customer support, vendor coordination, and internal reporting through email, Slack, and a ticketing system. The operations team wants an LLM-powered copilot that summarizes daily work, drafts routine responses, and extracts action items so operators can move faster without missing critical details.
Constraints
- p95 latency: 1,500ms for summary/drafting requests
- Cost ceiling: $8K/month at 100K requests/month
- Hallucination ceiling: <2% on a labeled offline set for factual summaries and extracted action items
- The assistant must not fabricate commitments, deadlines, or policy details
- Prompt injection risk is real because inputs may include forwarded emails or pasted external content
- If confidence is low or evidence is missing, the system should abstain or ask for clarification
Available Resources
- 12 months of historical operations data: emails, Slack threads, ticket comments, and final human-written resolutions
- 2,000 manually reviewed examples of high-quality summaries and action-item lists
- Access to a GPT-4-class or Claude-class model, plus a cheaper fallback model
- Existing metadata: ticket priority, owner, due date, customer tier, and resolution status
- A small annotation budget for creating a 300-example golden set
Task
- Design an LLM workflow that improves daily operational efficiency for ops staff through summarization, drafting, and structured action-item extraction.
- Write a production-ready system prompt that minimizes hallucinations, handles ambiguous inputs safely, and returns structured output.
- Define an evaluation plan before finalizing architecture, including offline quality metrics and online success metrics.
- Estimate cost and latency at target volume, and explain when to use the expensive model vs. a cheaper fallback.
- Identify key failure modes, especially hallucinated commitments and prompt injection from untrusted content, and propose mitigations.