Context
BrightCart, an e-commerce platform for SMB merchants, wants an AI assistant inside its customer support chat to improve customer satisfaction for order, refund, shipping, and account questions. The current bot has fast response times but low trust because it sometimes gives confident but incorrect policy answers.
Constraints
- p95 latency: 2,500ms end-to-end
- Cost ceiling: $12K/month at 400K support conversations/month
- Customer satisfaction target: +4 point lift in post-chat CSAT
- Hallucination ceiling: <2% on policy and refund questions
- Escalation rate must not increase by more than 1 percentage point
- Must resist prompt injection from user messages and retrieved documents
- Must not expose PII or internal-only policy notes
Available Resources
- 80K help-center articles, policy pages, and macro templates
- 2 years of anonymized support tickets with CSAT, escalation outcome, and resolution tags
- Structured order metadata (order status, shipment ETA, refund eligibility)
- Approved models: GPT-4.1-mini for generation, text-embedding-3-large for embeddings
- Existing hybrid search stack (BM25 + vector search)
Task
- Design an LLM-powered support assistant that uses retrieval and available structured data to improve customer satisfaction while staying within latency and cost limits.
- Write the system prompt for grounded answers, safe refusal behavior, and clear escalation rules when the answer is uncertain or policy-sensitive.
- Define an evaluation plan before architecture: include offline quality and safety evaluation, plus online metrics tied to CSAT, containment, and escalation.
- Estimate request-level and monthly cost/latency, and explain what you would change if the system misses either budget.
- Identify key failure modes, especially hallucinated policy answers, prompt injection, stale policies, and PII leakage, and propose mitigations.