Design Enterprise RAG for C3 AI

Context

C3 AI wants to add a grounded enterprise assistant inside the C3 AI Application Platform so operations, reliability, and field teams can ask natural-language questions over internal manuals, runbooks, asset records, and policy documents. The assistant must answer with citations and avoid unsafe or fabricated guidance.

Constraints

p95 end-to-end latency: < 2.5 seconds
Cost ceiling: < $25K/month at 1.2M queries/month
Hallucination ceiling: < 2% on a labeled evaluation set
Prompt-injection success rate: ~0% on adversarial tests
Must respect enterprise permissions and avoid exposing restricted or PII-bearing content
If evidence is weak or conflicting, the system should ask a clarifying question or refuse rather than guess

Available Data / Models

2M enterprise documents in C3 AI Data Lake: SOPs, maintenance logs, incident reports, equipment manuals, compliance policies, and knowledge-base articles
Metadata per document: business unit, ACL, timestamp, source system, asset ID, region
Approved LLMs via enterprise gateway (GPT-4.1 / Claude-class models) and embedding models
A managed vector index plus keyword search available through C3 AI Search-style services
5,000 historical user questions and 300 SME-labeled Q&A pairs to bootstrap evaluation

Deliverables

Design the end-to-end RAG architecture: ingestion, chunking, embeddings, indexing, retrieval, reranking, generation, and permission filtering.
Define an evaluation-first plan with offline and online metrics for answer quality, retrieval quality, hallucination, refusal quality, and safety.
Specify how you would defend against prompt injection, stale content, ACL leakage, and unsupported answers.
Estimate cost and latency at target volume, including the main levers you would use to stay within budget.
Describe the production monitoring dashboard you would build and the alerts or rollback criteria you would use after launch.

Context

Constraints

p95 end-to-end latency: < 2.5 seconds
Cost ceiling: < $25K/month at 1.2M queries/month
Hallucination ceiling: < 2% on a labeled evaluation set
Prompt-injection success rate: ~0% on adversarial tests
Must respect enterprise permissions and avoid exposing restricted or PII-bearing content
If evidence is weak or conflicting, the system should ask a clarifying question or refuse rather than guess

Available Data / Models

2M enterprise documents in C3 AI Data Lake: SOPs, maintenance logs, incident reports, equipment manuals, compliance policies, and knowledge-base articles
Metadata per document: business unit, ACL, timestamp, source system, asset ID, region
Approved LLMs via enterprise gateway (GPT-4.1 / Claude-class models) and embedding models
A managed vector index plus keyword search available through C3 AI Search-style services
5,000 historical user questions and 300 SME-labeled Q&A pairs to bootstrap evaluation

Deliverables

Design the end-to-end RAG architecture: ingestion, chunking, embeddings, indexing, retrieval, reranking, generation, and permission filtering.
Define an evaluation-first plan with offline and online metrics for answer quality, retrieval quality, hallucination, refusal quality, and safety.
Specify how you would defend against prompt injection, stale content, ACL leakage, and unsupported answers.
Estimate cost and latency at target volume, including the main levers you would use to stay within budget.
Describe the production monitoring dashboard you would build and the alerts or rollback criteria you would use after launch.

Context

Constraints

p95 end-to-end latency: < 2.5 seconds
Cost ceiling: < $25K/month at 1.2M queries/month
Hallucination ceiling: < 2% on a labeled evaluation set
Prompt-injection success rate: ~0% on adversarial tests
Must respect enterprise permissions and avoid exposing restricted or PII-bearing content
If evidence is weak or conflicting, the system should ask a clarifying question or refuse rather than guess

Available Data / Models

2M enterprise documents in C3 AI Data Lake: SOPs, maintenance logs, incident reports, equipment manuals, compliance policies, and knowledge-base articles
Metadata per document: business unit, ACL, timestamp, source system, asset ID, region
Approved LLMs via enterprise gateway (GPT-4.1 / Claude-class models) and embedding models
A managed vector index plus keyword search available through C3 AI Search-style services
5,000 historical user questions and 300 SME-labeled Q&A pairs to bootstrap evaluation

Deliverables

Design the end-to-end RAG architecture: ingestion, chunking, embeddings, indexing, retrieval, reranking, generation, and permission filtering.
Define an evaluation-first plan with offline and online metrics for answer quality, retrieval quality, hallucination, refusal quality, and safety.
Specify how you would defend against prompt injection, stale content, ACL leakage, and unsupported answers.
Estimate cost and latency at target volume, including the main levers you would use to stay within budget.
Describe the production monitoring dashboard you would build and the alerts or rollback criteria you would use after launch.

Context

Constraints

p95 end-to-end latency: < 2.5 seconds
Cost ceiling: < $25K/month at 1.2M queries/month
Hallucination ceiling: < 2% on a labeled evaluation set
Prompt-injection success rate: ~0% on adversarial tests
Must respect enterprise permissions and avoid exposing restricted or PII-bearing content
If evidence is weak or conflicting, the system should ask a clarifying question or refuse rather than guess

Available Data / Models

2M enterprise documents in C3 AI Data Lake: SOPs, maintenance logs, incident reports, equipment manuals, compliance policies, and knowledge-base articles
Metadata per document: business unit, ACL, timestamp, source system, asset ID, region
Approved LLMs via enterprise gateway (GPT-4.1 / Claude-class models) and embedding models
A managed vector index plus keyword search available through C3 AI Search-style services
5,000 historical user questions and 300 SME-labeled Q&A pairs to bootstrap evaluation

Deliverables

Design the end-to-end RAG architecture: ingestion, chunking, embeddings, indexing, retrieval, reranking, generation, and permission filtering.
Define an evaluation-first plan with offline and online metrics for answer quality, retrieval quality, hallucination, refusal quality, and safety.
Specify how you would defend against prompt injection, stale content, ACL leakage, and unsupported answers.
Estimate cost and latency at target volume, including the main levers you would use to stay within budget.
Describe the production monitoring dashboard you would build and the alerts or rollback criteria you would use after launch.

Interview Guides

Context

Constraints

Available Data / Models

Deliverables

Design Enterprise RAG for C3 AI

Context

Constraints

Available Data / Models

Deliverables

Your Answer

Design Enterprise RAG for C3 AI

Context

Constraints

Available Data / Models

Deliverables

Design Enterprise RAG for C3 AI

Context

Constraints

Available Data / Models

Deliverables

Your Answer