Context
BrightDesk, a SaaS helpdesk platform, wants an LLM-powered assistant for support agents handling live customer chats. The assistant should draft grounded responses, summarize account context, and suggest next actions so agents can respond faster without giving incorrect policy or product guidance.
Constraints
- p95 latency: 1,500ms per assistant turn
- Cost ceiling: $12K/month at 300K assistant turns
- Accuracy bar: at least 85% of drafts rated "acceptable without major edits" on an internal review set
- Hallucination ceiling: fewer than 2% of responses may contain unsupported product or policy claims
- Safety: must resist prompt injection from pasted customer text, must not leak PII across accounts, and must refuse unsupported billing/legal claims
Available Resources
- 40K help center articles, internal SOPs, and policy docs
- CRM metadata for the active customer: plan tier, open tickets, product usage summary, and recent chat history
- Approved models: GPT-4.1-mini for generation, a smaller embedding model for retrieval
- Existing search stack supports BM25 and vector search
- Historical support conversations with agent edits and CSAT outcomes
Task
- Design a practical LLM solution for this customer-facing workflow, including prompt design and when to use retrieval versus account context.
- Define an evaluation plan first: offline golden set, hallucination measurement, and online success metrics after launch.
- Propose safeguards for prompt injection, unsupported answers, and PII handling in a live support environment.
- Estimate cost and latency at the target volume, and explain the main tradeoffs.
- Describe how you would measure whether your prior AI experience actually improved customer outcomes, not just model quality.
Your answer should be concrete. Assume you are the engineer responsible for shipping an MVP in six weeks with one product manager and two backend engineers.