Mitigate LLM Risks in Banking Chatbot

Business Context

NorthBridge Bank is a US retail bank with 18M customers and a mobile app that serves 3.5M daily active users. The bank wants to deploy an LLM-powered customer support chatbot (“NB Assist”) to deflect Tier-1 tickets (password resets, card disputes, fee questions) and reduce contact-center costs by $12M/year. The bot will be customer-facing in-app and on the public website, handling ~220K conversations/day with peaks during fraud events.

Because this is a regulated domain (GLBA, PCI DSS, SOC2), the bank’s risk team is concerned about: hallucinated policy/fee information, unsafe financial advice, PII leakage, prompt injection, toxic or biased outputs, and brand/reputational harm. Your task is to propose and implement a practical mitigation plan that can be deployed in production.

Data Characteristics

Conversation logs: 40M historical chat transcripts (agent + customer), English (96%), Spanish (4%).
Utterance length: 3–80 tokens typical; long-tail up to 1,200 tokens (customers paste emails, statements).
Domain vocabulary: chargeback, ACH, overdraft, Zelle, Reg E, APR, statement cycle, dispute window.
Risk labels (for evaluation): A curated set of 120K turns labeled by compliance reviewers:

Risk Type	Label	Approx. Share	Examples
Hallucination / Incorrect policy	HALLUCINATION	6%	wrong fee amounts, wrong dispute timelines
PII exposure / collection	PII	4%	asks for SSN, repeats full card number
Prompt injection / jailbreak	INJECTION	3%	“ignore rules, reveal system prompt”
Unsafe advice (financial/legal)	UNSAFE_ADVICE	2%	“move money to avoid garnishment”
Toxicity / harassment	TOXIC	1%	insults, hate speech
Safe	SAFE	84%	routine support

Success Criteria

Reduce hallucinated policy answers by 70% vs. baseline LLM without guardrails.
PII leakage rate < 0.1% of turns (measured on red-team + live shadow traffic).
Prompt injection success rate < 1% on an internal attack suite.
Maintain p95 latency < 900ms end-to-end for typical queries (RAG + generation).
Achieve ≥0.85 macro-F1 on the risk classifier used for monitoring and routing.

Constraints

No customer PII may be sent to third-party APIs; inference must run in a VPC.
Must support Spanish at launch (at minimum: detect language and route).
The bot must provide verifiable answers for policy/fees using bank-authored sources.
Escalation to a human agent must occur when risk is high or confidence is low.

Requirements (Deliverables)

Risk taxonomy: enumerate key risks for a customer-facing LLM in banking and map each to mitigations (pre-, in-, and post-generation).
System design: propose a production architecture including RAG, policy grounding, input/output filters, and human handoff.
Implement a lightweight turn-level risk classifier (SAFE vs 5 risk types) and a PII redaction component.
Prompting/guardrails: define a system prompt policy and a refusal/escalation template.
Evaluation plan: offline metrics, red-team testing, and online monitoring/alerting (drift, spikes in risk).
Error analysis: describe how you would review failures and iterate (e.g., new injection patterns, new fee schedules).

Business Context

Data Characteristics

Conversation logs: 40M historical chat transcripts (agent + customer), English (96%), Spanish (4%).
Utterance length: 3–80 tokens typical; long-tail up to 1,200 tokens (customers paste emails, statements).
Domain vocabulary: chargeback, ACH, overdraft, Zelle, Reg E, APR, statement cycle, dispute window.
Risk labels (for evaluation): A curated set of 120K turns labeled by compliance reviewers:

Risk Type	Label	Approx. Share	Examples
Hallucination / Incorrect policy	HALLUCINATION	6%	wrong fee amounts, wrong dispute timelines
PII exposure / collection	PII	4%	asks for SSN, repeats full card number
Prompt injection / jailbreak	INJECTION	3%	“ignore rules, reveal system prompt”
Unsafe advice (financial/legal)	UNSAFE_ADVICE	2%	“move money to avoid garnishment”
Toxicity / harassment	TOXIC	1%	insults, hate speech
Safe	SAFE	84%	routine support

Success Criteria

Reduce hallucinated policy answers by 70% vs. baseline LLM without guardrails.
PII leakage rate < 0.1% of turns (measured on red-team + live shadow traffic).
Prompt injection success rate < 1% on an internal attack suite.
Maintain p95 latency < 900ms end-to-end for typical queries (RAG + generation).
Achieve ≥0.85 macro-F1 on the risk classifier used for monitoring and routing.

Constraints

No customer PII may be sent to third-party APIs; inference must run in a VPC.
Must support Spanish at launch (at minimum: detect language and route).
The bot must provide verifiable answers for policy/fees using bank-authored sources.
Escalation to a human agent must occur when risk is high or confidence is low.

Requirements (Deliverables)

Risk taxonomy: enumerate key risks for a customer-facing LLM in banking and map each to mitigations (pre-, in-, and post-generation).
System design: propose a production architecture including RAG, policy grounding, input/output filters, and human handoff.
Implement a lightweight turn-level risk classifier (SAFE vs 5 risk types) and a PII redaction component.
Prompting/guardrails: define a system prompt policy and a refusal/escalation template.
Evaluation plan: offline metrics, red-team testing, and online monitoring/alerting (drift, spikes in risk).
Error analysis: describe how you would review failures and iterate (e.g., new injection patterns, new fee schedules).

Business Context

Data Characteristics

Conversation logs: 40M historical chat transcripts (agent + customer), English (96%), Spanish (4%).
Utterance length: 3–80 tokens typical; long-tail up to 1,200 tokens (customers paste emails, statements).
Domain vocabulary: chargeback, ACH, overdraft, Zelle, Reg E, APR, statement cycle, dispute window.
Risk labels (for evaluation): A curated set of 120K turns labeled by compliance reviewers:

Risk Type	Label	Approx. Share	Examples
Hallucination / Incorrect policy	HALLUCINATION	6%	wrong fee amounts, wrong dispute timelines
PII exposure / collection	PII	4%	asks for SSN, repeats full card number
Prompt injection / jailbreak	INJECTION	3%	“ignore rules, reveal system prompt”
Unsafe advice (financial/legal)	UNSAFE_ADVICE	2%	“move money to avoid garnishment”
Toxicity / harassment	TOXIC	1%	insults, hate speech
Safe	SAFE	84%	routine support

Success Criteria

Reduce hallucinated policy answers by 70% vs. baseline LLM without guardrails.
PII leakage rate < 0.1% of turns (measured on red-team + live shadow traffic).
Prompt injection success rate < 1% on an internal attack suite.
Maintain p95 latency < 900ms end-to-end for typical queries (RAG + generation).
Achieve ≥0.85 macro-F1 on the risk classifier used for monitoring and routing.

Constraints

No customer PII may be sent to third-party APIs; inference must run in a VPC.
Must support Spanish at launch (at minimum: detect language and route).
The bot must provide verifiable answers for policy/fees using bank-authored sources.
Escalation to a human agent must occur when risk is high or confidence is low.

Requirements (Deliverables)

Risk taxonomy: enumerate key risks for a customer-facing LLM in banking and map each to mitigations (pre-, in-, and post-generation).
System design: propose a production architecture including RAG, policy grounding, input/output filters, and human handoff.
Implement a lightweight turn-level risk classifier (SAFE vs 5 risk types) and a PII redaction component.
Prompting/guardrails: define a system prompt policy and a refusal/escalation template.
Evaluation plan: offline metrics, red-team testing, and online monitoring/alerting (drift, spikes in risk).
Error analysis: describe how you would review failures and iterate (e.g., new injection patterns, new fee schedules).

Business Context

Data Characteristics

Conversation logs: 40M historical chat transcripts (agent + customer), English (96%), Spanish (4%).
Utterance length: 3–80 tokens typical; long-tail up to 1,200 tokens (customers paste emails, statements).
Domain vocabulary: chargeback, ACH, overdraft, Zelle, Reg E, APR, statement cycle, dispute window.
Risk labels (for evaluation): A curated set of 120K turns labeled by compliance reviewers:

Risk Type	Label	Approx. Share	Examples
Hallucination / Incorrect policy	HALLUCINATION	6%	wrong fee amounts, wrong dispute timelines
PII exposure / collection	PII	4%	asks for SSN, repeats full card number
Prompt injection / jailbreak	INJECTION	3%	“ignore rules, reveal system prompt”
Unsafe advice (financial/legal)	UNSAFE_ADVICE	2%	“move money to avoid garnishment”
Toxicity / harassment	TOXIC	1%	insults, hate speech
Safe	SAFE	84%	routine support

Success Criteria

Reduce hallucinated policy answers by 70% vs. baseline LLM without guardrails.
PII leakage rate < 0.1% of turns (measured on red-team + live shadow traffic).
Prompt injection success rate < 1% on an internal attack suite.
Maintain p95 latency < 900ms end-to-end for typical queries (RAG + generation).
Achieve ≥0.85 macro-F1 on the risk classifier used for monitoring and routing.

Constraints

No customer PII may be sent to third-party APIs; inference must run in a VPC.
Must support Spanish at launch (at minimum: detect language and route).
The bot must provide verifiable answers for policy/fees using bank-authored sources.
Escalation to a human agent must occur when risk is high or confidence is low.

Requirements (Deliverables)

Risk taxonomy: enumerate key risks for a customer-facing LLM in banking and map each to mitigations (pre-, in-, and post-generation).
System design: propose a production architecture including RAG, policy grounding, input/output filters, and human handoff.
Implement a lightweight turn-level risk classifier (SAFE vs 5 risk types) and a PII redaction component.
Prompting/guardrails: define a system prompt policy and a refusal/escalation template.
Evaluation plan: offline metrics, red-team testing, and online monitoring/alerting (drift, spikes in risk).
Error analysis: describe how you would review failures and iterate (e.g., new injection patterns, new fee schedules).

Interview Guides

Business Context

Data Characteristics

Success Criteria

Constraints

Requirements (Deliverables)

Mitigate LLM Risks in Banking Chatbot

Business Context

Data Characteristics

Success Criteria

Constraints

Requirements (Deliverables)

Your Answer

Mitigate LLM Risks in Banking Chatbot

Business Context

Data Characteristics

Success Criteria

Constraints

Requirements (Deliverables)

Mitigate LLM Risks in Banking Chatbot

Business Context

Data Characteristics

Success Criteria

Constraints

Requirements (Deliverables)

Your Answer