Business Context
You’re building Support Copilot for CardForge, a fintech that issues credit cards and provides a mobile banking app to 18M monthly active users across the US and EU. The copilot is an LLM-powered assistant used by 6,000 human support agents to draft responses about disputes, chargebacks, KYC/AML verification, card shipping, and account freezes. Today, agents search an internal wiki and policy PDFs manually; average handle time is 11.5 minutes, and incorrect policy citations have caused regulatory escalations and chargeback losses.
CardForge wants a retrieval-augmented generation (RAG) system that reliably fetches the right policy clauses, runbooks, and product notes for the LLM to cite. The key risk is hallucinated policy (e.g., inventing refund windows) and stale guidance (e.g., outdated dispute timelines after a policy update).
Data Characteristics
You have three corpora:
- Policy & Compliance (high priority): ~45,000 pages across PDFs/HTML, including card network rules, KYC procedures, and legal terms.
- Operational Runbooks: ~12,000 Markdown pages (incident playbooks, escalation paths, tooling steps).
- Product Release Notes & Known Issues: ~30,000 short docs (100–600 words) with frequent updates.
Additional properties:
- Volume: ~2.8M total text chunks after splitting (expected)
- Text length: 50–4,000 words per document; many tables, bullet lists, and section headers
- Language: English (90%), Spanish (7%), French (3%)
- Domain vocabulary: “chargeback reason code”, “provisional credit”, “3DS”, “KYC”, “SAR”, “merchant dispute”, “Reg E”, “SEPA”, “AML hold”
- Freshness: ~1,000 docs/week change; some policies are time-bound
Success Criteria
A launch is considered successful if:
- Answer accuracy improves agent QA pass rate from 92% → 97%
- Hallucinated policy citations drop by 60% (measured via QA audits)
- Agent handle time reduces by 20%
- Retrieval returns at least one correct supporting passage in top-3 for ≥90% of audited tickets
Constraints
- Latency: p95 end-to-end retrieval (query → top-k passages) < 250 ms in the agent UI
- Security/Compliance: EU data residency; access control by team (Disputes vs KYC vs Fraud)
- Explainability: Must show citations with doc title, section, and last-updated timestamp
- Cost: Embedding + indexing must fit within a fixed monthly budget; avoid re-embedding everything daily
Requirements (Deliverables)
- Propose an end-to-end retrieval design for RAG (ingestion → chunking → embedding → indexing → ranking → context assembly).
- Specify how you will handle chunking, tables, and long PDFs (including overlap strategy).
- Choose embedding models and justify multilingual handling.
- Implement a hybrid retrieval approach (lexical + dense) with a cross-encoder reranker.
- Describe how you will enforce document-level ACLs and freshness.
- Define an offline evaluation plan (gold set creation, metrics) and an online monitoring plan.
- Provide a fallback strategy for low-confidence retrieval (e.g., ask clarifying questions, escalate, or broaden search).