Context
C3 AI wants to add a grounded enterprise assistant inside the C3 AI Application Platform so operations, reliability, and field teams can ask natural-language questions over internal manuals, runbooks, asset records, and policy documents. The assistant must answer with citations and avoid unsafe or fabricated guidance.
Constraints
- p95 end-to-end latency: < 2.5 seconds
- Cost ceiling: < $25K/month at 1.2M queries/month
- Hallucination ceiling: < 2% on a labeled evaluation set
- Prompt-injection success rate: ~0% on adversarial tests
- Must respect enterprise permissions and avoid exposing restricted or PII-bearing content
- If evidence is weak or conflicting, the system should ask a clarifying question or refuse rather than guess
Available Data / Models
- 2M enterprise documents in C3 AI Data Lake: SOPs, maintenance logs, incident reports, equipment manuals, compliance policies, and knowledge-base articles
- Metadata per document: business unit, ACL, timestamp, source system, asset ID, region
- Approved LLMs via enterprise gateway (GPT-4.1 / Claude-class models) and embedding models
- A managed vector index plus keyword search available through C3 AI Search-style services
- 5,000 historical user questions and 300 SME-labeled Q&A pairs to bootstrap evaluation
Deliverables
- Design the end-to-end RAG architecture: ingestion, chunking, embeddings, indexing, retrieval, reranking, generation, and permission filtering.
- Define an evaluation-first plan with offline and online metrics for answer quality, retrieval quality, hallucination, refusal quality, and safety.
- Specify how you would defend against prompt injection, stale content, ACL leakage, and unsupported answers.
- Estimate cost and latency at target volume, including the main levers you would use to stay within budget.
- Describe the production monitoring dashboard you would build and the alerts or rollback criteria you would use after launch.