Business Context
BrightDesk, a B2B customer support platform, wants to use an LLM to power two production features: (1) concise summaries of long support tickets for agents, and (2) question answering over internal help-center articles and policy documents. The goal is to reduce agent handle time without introducing unsupported or non-compliant answers.
Data
- Sources: 2.5M historical support tickets, 18,000 help-center articles, 6,500 internal policy/process documents
- Text length: tickets range from 30-4,000 words; documents range from 100-12,000 words
- Language: English only
- Labels / supervision: historical ticket resolutions, article metadata, agent-written summaries for ~120,000 tickets
- Distribution: highly skewed toward billing, login, API integration, and account administration issues
Success Criteria
- Ticket summaries should preserve key issue, customer impact, actions taken, and next step
- QA responses should achieve grounded-answer accuracy = 85% on a held-out benchmark
- Hallucination rate must stay below 3%
- P95 end-to-end latency must be under 2.5 seconds for interactive queries
Constraints
- No answer may include content not supported by retrieved documents
- PII must be masked before prompts are sent to the model
- System must support document updates within 15 minutes
- Deployment budget allows one managed embedding service and one hosted LLM endpoint
Requirements
- Design a production workflow for both summarization and retrieval-augmented question answering.
- Define the preprocessing and chunking pipeline for tickets and knowledge documents.
- Specify how you would build embeddings, retrieval, reranking, prompting, and fallback logic.
- Describe how you would evaluate summary quality, answer grounding, latency, and failure modes.
- Provide a modern Python implementation for preprocessing, indexing, retrieval, prompting, and offline evaluation.