Defend a RAG Assistant from Injection

Scenario

You are building a document-grounded assistant for an internal operations team that answers questions over policy manuals, customer communications, and uploaded files. The assistant is already useful, but security review found that users can paste adversarial text or upload documents containing instructions like “ignore prior rules” and “reveal hidden prompts.” The product is expected to handle thousands of daily queries, and some answers may affect financial workflows, so unsafe behavior is a launch blocker.

Constraints

p95 latency: 2,500ms end-to-end
Cost ceiling: $0.03 per request at projected volume
Prompt injection success rate: <1% on an adversarial eval set
Unsupported factual answers must refuse rather than guess
No leakage of hidden prompts, credentials, or sensitive customer data

Available Resources

A hosted LLM API with tool calling and structured outputs
A hybrid retrieval stack over internal documents and user-uploaded files
5,000 historical queries plus security-team adversarial examples
Capacity for 200 manually reviewed eval examples per month

Question

How would you design and defend this LLM application against prompt injection attacks while still keeping it useful, fast, and affordable? Explain the system design you would choose, how you would evaluate it before launch, and how you would detect and mitigate failures in production.

Scenario

Constraints

p95 latency: 2,500ms end-to-end
Cost ceiling: $0.03 per request at projected volume
Prompt injection success rate: <1% on an adversarial eval set
Unsupported factual answers must refuse rather than guess
No leakage of hidden prompts, credentials, or sensitive customer data

Available Resources

A hosted LLM API with tool calling and structured outputs
A hybrid retrieval stack over internal documents and user-uploaded files
5,000 historical queries plus security-team adversarial examples
Capacity for 200 manually reviewed eval examples per month

Question

Scenario

Constraints

p95 latency: 2,500ms end-to-end
Cost ceiling: $0.03 per request at projected volume
Prompt injection success rate: <1% on an adversarial eval set
Unsupported factual answers must refuse rather than guess
No leakage of hidden prompts, credentials, or sensitive customer data

Available Resources

A hosted LLM API with tool calling and structured outputs
A hybrid retrieval stack over internal documents and user-uploaded files
5,000 historical queries plus security-team adversarial examples
Capacity for 200 manually reviewed eval examples per month

Question

Scenario

Constraints

p95 latency: 2,500ms end-to-end
Cost ceiling: $0.03 per request at projected volume
Prompt injection success rate: <1% on an adversarial eval set
Unsupported factual answers must refuse rather than guess
No leakage of hidden prompts, credentials, or sensitive customer data

Available Resources

A hosted LLM API with tool calling and structured outputs
A hybrid retrieval stack over internal documents and user-uploaded files
5,000 historical queries plus security-team adversarial examples
Capacity for 200 manually reviewed eval examples per month

Interview Guides

Scenario

Constraints

Available Resources

Question

Defend a RAG Assistant from Injection

Scenario

Constraints

Available Resources

Question

Your Answer

Defend a RAG Assistant from Injection

Scenario

Constraints

Available Resources

Question

Defend a RAG Assistant from Injection

Scenario

Constraints

Available Resources

Question

Your Answer