Context
BrightDesk is building an internal AI assistant for sales engineers and product managers. One common use case is answering stakeholder questions like: "What's the difference between fine-tuning and RAG, and when should we use each?" in plain business language.
Constraints
- p95 latency: 1,500ms
- Cost ceiling: $3,000/month at 20,000 requests/month
- Hallucination rate: <2% on a 150-question golden set
- Answers must be understandable to non-technical stakeholders and avoid unnecessary jargon
- The assistant must not invent company capabilities, customer examples, or ROI claims
- Prompt injection and unsupported claims are considered real production risks
Available Resources
- A curated internal knowledge base with 40 short documents: AI glossary, architecture patterns, pricing notes, case studies, and approved messaging
- 200 historical stakeholder questions with human-written answers
- Access to a hosted LLM, embedding model, and vector search index
- PM and solutions engineering reviewers who can label a small evaluation set
Task
- Propose whether you would solve this primarily with prompt design, RAG, fine-tuning, or a combination, and justify the choice for this use case.
- Design an evaluation plan first: define offline and online metrics for clarity, factuality, hallucination, and stakeholder usefulness.
- Write a system prompt that explains fine-tuning vs RAG in plain English, includes when to use each, and refuses unsupported business claims.
- Describe the serving architecture, including whether retrieval is needed, how you would ground answers, and how you would keep latency and cost within budget.
- Identify likely failure modes such as jargon-heavy answers, hallucinated examples, and prompt injection, and explain mitigations.