RAG for Fintech Support Answers

Business Context

You’re interviewing for an NLP Engineer role on MercuryPay, a fintech app with 18M monthly active users offering checking accounts, debit cards, and international transfers. MercuryPay’s customer support team handles ~220K tickets/day across chat and email. Many tickets require referencing fast-changing internal policy docs (fee schedules, dispute rules, compliance playbooks). The company wants to deploy a Retrieval-Augmented Generation (RAG) assistant that drafts accurate, policy-grounded responses and reduces average handle time by 30%, while meeting strict financial compliance requirements.

Unlike a pure LLM chatbot, this assistant must cite the exact policy passages used, avoid hallucinating fees/limits, and gracefully escalate when the knowledge base doesn’t contain an answer.

Data Characteristics

MercuryPay maintains:

Knowledge base (KB): ~85,000 documents (HTML/PDF/Markdown), ~14 GB text after extraction.
- Document length: 200–12,000 tokens (median ~1,100)
- Update frequency: 5–10% of docs change weekly (new promotions, regulatory updates)
- Domain vocabulary: NACHA returns, chargebacks, KYC, AML, MCC codes, interchange, SWIFT/IBAN
Ticket stream: 220K/day
- User message length: 5–600 words (median ~55)
- Languages: English 88%, Spanish 7%, Portuguese 3%, other 2%
- PII: names, emails, phone numbers, last-4 of SSN, bank account fragments
Ground truth: 1.5M historical tickets with final agent responses; only ~35% have clean links to the KB articles used.

Success Criteria

A “good” RAG system must:

Achieve ≥80% “grounded answer rate” (answer supported by retrieved passages) on an internal evaluation set.
Reduce policy-related escalations by 25% without increasing compliance incidents.
Provide p95 latency ≤ 1.2s for retrieval + generation at 50 QPS.
Produce responses that include citations (doc id + section heading) and a confidence / escalation decision.

Constraints

No PII may be stored in vector DB; queries must be redacted before indexing/logging.
Must run in a regulated environment: all prompts, retrieved passages, and outputs are auditable for 7 years.
Model budget: one A10/T4-class GPU per service replica; embedding model must run on CPU or small GPU.

Requirements (Deliverables)

Explain RAG end-to-end: ingestion → chunking → embeddings → vector index → retrieval → prompt construction → generation → citations.
Design a chunking and indexing strategy for long policy docs (include how you handle tables and headings).
Propose a retrieval approach (dense, sparse, hybrid) and justify it for fintech policy text.
Implement a minimal RAG pipeline in Python (ingest a small KB, build index, retrieve top-k, generate an answer with citations).
Define an evaluation plan: offline metrics (retrieval + generation), human review rubrics, and production monitoring.
Describe at least 3 failure modes (e.g., stale docs, semantic drift, prompt injection) and mitigations.

Your answer should be practical: assume you will ship an MVP in 6–8 weeks, then iterate.

Business Context

Unlike a pure LLM chatbot, this assistant must cite the exact policy passages used, avoid hallucinating fees/limits, and gracefully escalate when the knowledge base doesn’t contain an answer.

Data Characteristics

MercuryPay maintains:

Knowledge base (KB): ~85,000 documents (HTML/PDF/Markdown), ~14 GB text after extraction.
- Document length: 200–12,000 tokens (median ~1,100)
- Update frequency: 5–10% of docs change weekly (new promotions, regulatory updates)
- Domain vocabulary: NACHA returns, chargebacks, KYC, AML, MCC codes, interchange, SWIFT/IBAN
Ticket stream: 220K/day
- User message length: 5–600 words (median ~55)
- Languages: English 88%, Spanish 7%, Portuguese 3%, other 2%
- PII: names, emails, phone numbers, last-4 of SSN, bank account fragments
Ground truth: 1.5M historical tickets with final agent responses; only ~35% have clean links to the KB articles used.

Success Criteria

A “good” RAG system must:

Achieve ≥80% “grounded answer rate” (answer supported by retrieved passages) on an internal evaluation set.
Reduce policy-related escalations by 25% without increasing compliance incidents.
Provide p95 latency ≤ 1.2s for retrieval + generation at 50 QPS.
Produce responses that include citations (doc id + section heading) and a confidence / escalation decision.

Constraints

No PII may be stored in vector DB; queries must be redacted before indexing/logging.
Must run in a regulated environment: all prompts, retrieved passages, and outputs are auditable for 7 years.
Model budget: one A10/T4-class GPU per service replica; embedding model must run on CPU or small GPU.

Requirements (Deliverables)

Explain RAG end-to-end: ingestion → chunking → embeddings → vector index → retrieval → prompt construction → generation → citations.
Design a chunking and indexing strategy for long policy docs (include how you handle tables and headings).
Propose a retrieval approach (dense, sparse, hybrid) and justify it for fintech policy text.
Implement a minimal RAG pipeline in Python (ingest a small KB, build index, retrieve top-k, generate an answer with citations).
Define an evaluation plan: offline metrics (retrieval + generation), human review rubrics, and production monitoring.
Describe at least 3 failure modes (e.g., stale docs, semantic drift, prompt injection) and mitigations.

Your answer should be practical: assume you will ship an MVP in 6–8 weeks, then iterate.

Business Context

Unlike a pure LLM chatbot, this assistant must cite the exact policy passages used, avoid hallucinating fees/limits, and gracefully escalate when the knowledge base doesn’t contain an answer.

Data Characteristics

MercuryPay maintains:

Knowledge base (KB): ~85,000 documents (HTML/PDF/Markdown), ~14 GB text after extraction.
- Document length: 200–12,000 tokens (median ~1,100)
- Update frequency: 5–10% of docs change weekly (new promotions, regulatory updates)
- Domain vocabulary: NACHA returns, chargebacks, KYC, AML, MCC codes, interchange, SWIFT/IBAN
Ticket stream: 220K/day
- User message length: 5–600 words (median ~55)
- Languages: English 88%, Spanish 7%, Portuguese 3%, other 2%
- PII: names, emails, phone numbers, last-4 of SSN, bank account fragments
Ground truth: 1.5M historical tickets with final agent responses; only ~35% have clean links to the KB articles used.

Success Criteria

A “good” RAG system must:

Achieve ≥80% “grounded answer rate” (answer supported by retrieved passages) on an internal evaluation set.
Reduce policy-related escalations by 25% without increasing compliance incidents.
Provide p95 latency ≤ 1.2s for retrieval + generation at 50 QPS.
Produce responses that include citations (doc id + section heading) and a confidence / escalation decision.

Constraints

No PII may be stored in vector DB; queries must be redacted before indexing/logging.
Must run in a regulated environment: all prompts, retrieved passages, and outputs are auditable for 7 years.
Model budget: one A10/T4-class GPU per service replica; embedding model must run on CPU or small GPU.

Requirements (Deliverables)

Explain RAG end-to-end: ingestion → chunking → embeddings → vector index → retrieval → prompt construction → generation → citations.
Design a chunking and indexing strategy for long policy docs (include how you handle tables and headings).
Propose a retrieval approach (dense, sparse, hybrid) and justify it for fintech policy text.
Implement a minimal RAG pipeline in Python (ingest a small KB, build index, retrieve top-k, generate an answer with citations).
Define an evaluation plan: offline metrics (retrieval + generation), human review rubrics, and production monitoring.
Describe at least 3 failure modes (e.g., stale docs, semantic drift, prompt injection) and mitigations.

Your answer should be practical: assume you will ship an MVP in 6–8 weeks, then iterate.

Business Context

Unlike a pure LLM chatbot, this assistant must cite the exact policy passages used, avoid hallucinating fees/limits, and gracefully escalate when the knowledge base doesn’t contain an answer.

Data Characteristics

MercuryPay maintains:

Knowledge base (KB): ~85,000 documents (HTML/PDF/Markdown), ~14 GB text after extraction.
- Document length: 200–12,000 tokens (median ~1,100)
- Update frequency: 5–10% of docs change weekly (new promotions, regulatory updates)
- Domain vocabulary: NACHA returns, chargebacks, KYC, AML, MCC codes, interchange, SWIFT/IBAN
Ticket stream: 220K/day
- User message length: 5–600 words (median ~55)
- Languages: English 88%, Spanish 7%, Portuguese 3%, other 2%
- PII: names, emails, phone numbers, last-4 of SSN, bank account fragments
Ground truth: 1.5M historical tickets with final agent responses; only ~35% have clean links to the KB articles used.

Success Criteria

A “good” RAG system must:

Achieve ≥80% “grounded answer rate” (answer supported by retrieved passages) on an internal evaluation set.
Reduce policy-related escalations by 25% without increasing compliance incidents.
Provide p95 latency ≤ 1.2s for retrieval + generation at 50 QPS.
Produce responses that include citations (doc id + section heading) and a confidence / escalation decision.

Constraints

No PII may be stored in vector DB; queries must be redacted before indexing/logging.
Must run in a regulated environment: all prompts, retrieved passages, and outputs are auditable for 7 years.
Model budget: one A10/T4-class GPU per service replica; embedding model must run on CPU or small GPU.

Requirements (Deliverables)

Explain RAG end-to-end: ingestion → chunking → embeddings → vector index → retrieval → prompt construction → generation → citations.
Design a chunking and indexing strategy for long policy docs (include how you handle tables and headings).
Propose a retrieval approach (dense, sparse, hybrid) and justify it for fintech policy text.
Implement a minimal RAG pipeline in Python (ingest a small KB, build index, retrieve top-k, generate an answer with citations).
Define an evaluation plan: offline metrics (retrieval + generation), human review rubrics, and production monitoring.
Describe at least 3 failure modes (e.g., stale docs, semantic drift, prompt injection) and mitigations.

Your answer should be practical: assume you will ship an MVP in 6–8 weeks, then iterate.

Interview Guides

Business Context

Data Characteristics

Success Criteria

Constraints

Requirements (Deliverables)

RAG for Fintech Support Answers

Business Context

Data Characteristics

Success Criteria

Constraints

Requirements (Deliverables)

Your Answer

RAG for Fintech Support Answers

Business Context

Data Characteristics

Success Criteria

Constraints

Requirements (Deliverables)

RAG for Fintech Support Answers

Business Context

Data Characteristics

Success Criteria

Constraints

Requirements (Deliverables)

Your Answer