Context
NorthBridge Bank wants an internal LLM assistant for relationship managers and compliance analysts. The assistant should answer questions about approved policies, product disclosures, KYC/AML procedures, and internal operating manuals without giving ungrounded financial or regulatory advice.
Constraints
- p95 latency: 2,500ms for interactive Q&A
- Cost ceiling: $35K/month at 40K queries/day
- Hallucination ceiling: <1% on a regulated 400-question golden set
- 100% of factual answers must include citations to approved sources
- Must not reveal PII, account data, or confidential policy content outside the user's authorization scope
- Must resist prompt injection from uploaded files or retrieved documents
- Must produce auditable logs of retrieval results, model version, and final response
Available Resources
- 250K approved internal documents: policy PDFs, compliance manuals, product term sheets, call-center scripts, and regulatory interpretations
- Document metadata: jurisdiction, business line, effective date, approval status, confidentiality tier
- Existing hybrid search stack (BM25 + vector search)
- Approved models: a fast low-cost model and a higher-accuracy model for fallback
- 20 compliance SMEs available to label a golden set and review failures
Task
- Design a production-ready RAG system for this regulated environment, including access control, retrieval, prompt design, and refusal behavior.
- Define an evaluation plan before architecture decisions: offline metrics, adversarial tests, and online monitoring for hallucination, policy violations, and user trust.
- Specify how you would mitigate the main deployment risks in finance: hallucinations, stale regulations, prompt injection, PII leakage, and unauthorized disclosure.
- Estimate cost and latency at target volume, and explain what tradeoffs you would make if quality exceeds budget or latency exceeds SLA.
- Provide a concise implementation sketch in Python using a real LLM SDK and structured outputs where useful.