Context
FinSure, a global insurance company, wants an internal assistant that answers employee questions about HR policies, compliance manuals, security standards, and operating procedures. The feature is for 18,000 employees and must provide grounded answers with citations because incorrect guidance can create legal and audit risk.
Constraints
- p95 latency: 2,500ms for interactive queries
- Cost ceiling: $35K/month at 1.2M queries/month
- Hallucination ceiling: <2% unsupported factual claims on a labeled evaluation set
- Prompt injection success rate: <0.5% on adversarial tests
- Must respect document-level access controls and avoid exposing PII or confidential policy content to unauthorized users
- Answers must cite sources for all policy or compliance claims
Available Data / Models
- 1.8M internal documents: PDFs, Word docs, wiki pages, and policy manuals
- Metadata per document: business unit, sensitivity tier, owner, effective date, region, and ACLs
- Enterprise search index with BM25 support
- Managed vector database approved for internal use
- Access to OpenAI GPT-4.1-mini / GPT-4.1 and
text-embedding-3-large
- 2,000 historical employee questions and 150 compliance-reviewed answers for seeding evaluation
Deliverables
- Design the end-to-end RAG architecture, including ingestion, chunking, retrieval, reranking, generation, and permission filtering.
- Define an evaluation plan before architecture: offline quality/safety benchmarks and online monitoring for quality, safety, latency, and cost.
- Write a system prompt that enforces grounded answers, citations, refusal behavior, and resistance to prompt injection from retrieved content.
- Estimate request-level and monthly cost/latency, and explain how you would stay within budget while meeting the hallucination target.
- Identify the main failure modes in production and propose concrete mitigations and alerts.