Design Enterprise Policy RAG Assistant

Context

FinSure, a global insurance company, wants an internal assistant that answers employee questions about HR policies, compliance manuals, security standards, and operating procedures. The feature is for 18,000 employees and must provide grounded answers with citations because incorrect guidance can create legal and audit risk.

Constraints

p95 latency: 2,500ms for interactive queries
Cost ceiling: $35K/month at 1.2M queries/month
Hallucination ceiling: <2% unsupported factual claims on a labeled evaluation set
Prompt injection success rate: <0.5% on adversarial tests
Must respect document-level access controls and avoid exposing PII or confidential policy content to unauthorized users
Answers must cite sources for all policy or compliance claims

Available Data / Models

1.8M internal documents: PDFs, Word docs, wiki pages, and policy manuals
Metadata per document: business unit, sensitivity tier, owner, effective date, region, and ACLs
Enterprise search index with BM25 support
Managed vector database approved for internal use
Access to OpenAI GPT-4.1-mini / GPT-4.1 and text-embedding-3-large
2,000 historical employee questions and 150 compliance-reviewed answers for seeding evaluation

Deliverables

Design the end-to-end RAG architecture, including ingestion, chunking, retrieval, reranking, generation, and permission filtering.
Define an evaluation plan before architecture: offline quality/safety benchmarks and online monitoring for quality, safety, latency, and cost.
Write a system prompt that enforces grounded answers, citations, refusal behavior, and resistance to prompt injection from retrieved content.
Estimate request-level and monthly cost/latency, and explain how you would stay within budget while meeting the hallucination target.
Identify the main failure modes in production and propose concrete mitigations and alerts.

Context

Constraints

p95 latency: 2,500ms for interactive queries
Cost ceiling: $35K/month at 1.2M queries/month
Hallucination ceiling: <2% unsupported factual claims on a labeled evaluation set
Prompt injection success rate: <0.5% on adversarial tests
Must respect document-level access controls and avoid exposing PII or confidential policy content to unauthorized users
Answers must cite sources for all policy or compliance claims

Available Data / Models

1.8M internal documents: PDFs, Word docs, wiki pages, and policy manuals
Metadata per document: business unit, sensitivity tier, owner, effective date, region, and ACLs
Enterprise search index with BM25 support
Managed vector database approved for internal use
Access to OpenAI GPT-4.1-mini / GPT-4.1 and text-embedding-3-large
2,000 historical employee questions and 150 compliance-reviewed answers for seeding evaluation

Deliverables

Design the end-to-end RAG architecture, including ingestion, chunking, retrieval, reranking, generation, and permission filtering.
Define an evaluation plan before architecture: offline quality/safety benchmarks and online monitoring for quality, safety, latency, and cost.
Write a system prompt that enforces grounded answers, citations, refusal behavior, and resistance to prompt injection from retrieved content.
Estimate request-level and monthly cost/latency, and explain how you would stay within budget while meeting the hallucination target.
Identify the main failure modes in production and propose concrete mitigations and alerts.

Context

Constraints

p95 latency: 2,500ms for interactive queries
Cost ceiling: $35K/month at 1.2M queries/month
Hallucination ceiling: <2% unsupported factual claims on a labeled evaluation set
Prompt injection success rate: <0.5% on adversarial tests
Must respect document-level access controls and avoid exposing PII or confidential policy content to unauthorized users
Answers must cite sources for all policy or compliance claims

Available Data / Models

1.8M internal documents: PDFs, Word docs, wiki pages, and policy manuals
Metadata per document: business unit, sensitivity tier, owner, effective date, region, and ACLs
Enterprise search index with BM25 support
Managed vector database approved for internal use
Access to OpenAI GPT-4.1-mini / GPT-4.1 and text-embedding-3-large
2,000 historical employee questions and 150 compliance-reviewed answers for seeding evaluation

Deliverables

Design the end-to-end RAG architecture, including ingestion, chunking, retrieval, reranking, generation, and permission filtering.
Define an evaluation plan before architecture: offline quality/safety benchmarks and online monitoring for quality, safety, latency, and cost.
Write a system prompt that enforces grounded answers, citations, refusal behavior, and resistance to prompt injection from retrieved content.
Estimate request-level and monthly cost/latency, and explain how you would stay within budget while meeting the hallucination target.
Identify the main failure modes in production and propose concrete mitigations and alerts.

Context

Constraints

p95 latency: 2,500ms for interactive queries
Cost ceiling: $35K/month at 1.2M queries/month
Hallucination ceiling: <2% unsupported factual claims on a labeled evaluation set
Prompt injection success rate: <0.5% on adversarial tests
Must respect document-level access controls and avoid exposing PII or confidential policy content to unauthorized users
Answers must cite sources for all policy or compliance claims

Available Data / Models

1.8M internal documents: PDFs, Word docs, wiki pages, and policy manuals
Metadata per document: business unit, sensitivity tier, owner, effective date, region, and ACLs
Enterprise search index with BM25 support
Managed vector database approved for internal use
Access to OpenAI GPT-4.1-mini / GPT-4.1 and text-embedding-3-large
2,000 historical employee questions and 150 compliance-reviewed answers for seeding evaluation

Deliverables

Design the end-to-end RAG architecture, including ingestion, chunking, retrieval, reranking, generation, and permission filtering.
Define an evaluation plan before architecture: offline quality/safety benchmarks and online monitoring for quality, safety, latency, and cost.
Write a system prompt that enforces grounded answers, citations, refusal behavior, and resistance to prompt injection from retrieved content.
Estimate request-level and monthly cost/latency, and explain how you would stay within budget while meeting the hallucination target.
Identify the main failure modes in production and propose concrete mitigations and alerts.

Interview Guides

Context

Constraints

Available Data / Models

Deliverables

Design Enterprise Policy RAG Assistant

Context

Constraints

Available Data / Models

Deliverables

Your Answer

Design Enterprise Policy RAG Assistant

Context

Constraints

Available Data / Models

Deliverables

Design Enterprise Policy RAG Assistant

Context

Constraints

Available Data / Models

Deliverables

Your Answer