Dataford
Interview Guides
Upgrade
All questions/Generative AI & LLMs/Defend a RAG Assistant from Injection

Defend a RAG Assistant from Injection

Hard
Generative AI & LLMs
RAGPrompt EngineeringPrompt Injection

Problem

Scenario

You are building a document-grounded assistant for an internal operations team that answers questions over policy manuals, customer communications, and uploaded files. The assistant is already useful, but security review found that users can paste adversarial text or upload documents containing instructions like “ignore prior rules” and “reveal hidden prompts.” The product is expected to handle thousands of daily queries, and some answers may affect financial workflows, so unsafe behavior is a launch blocker.

Constraints

  • p95 latency: 2,500ms end-to-end
  • Cost ceiling: $0.03 per request at projected volume
  • Prompt injection success rate: <1% on an adversarial eval set
  • Unsupported factual answers must refuse rather than guess
  • No leakage of hidden prompts, credentials, or sensitive customer data

Available Resources

  • A hosted LLM API with tool calling and structured outputs
  • A hybrid retrieval stack over internal documents and user-uploaded files
  • 5,000 historical queries plus security-team adversarial examples
  • Capacity for 200 manually reviewed eval examples per month

Question

How would you design and defend this LLM application against prompt injection attacks while still keeping it useful, fast, and affordable? Explain the system design you would choose, how you would evaluate it before launch, and how you would detect and mitigate failures in production.

Problem

Scenario

You are building a document-grounded assistant for an internal operations team that answers questions over policy manuals, customer communications, and uploaded files. The assistant is already useful, but security review found that users can paste adversarial text or upload documents containing instructions like “ignore prior rules” and “reveal hidden prompts.” The product is expected to handle thousands of daily queries, and some answers may affect financial workflows, so unsafe behavior is a launch blocker.

Constraints

  • p95 latency: 2,500ms end-to-end
  • Cost ceiling: $0.03 per request at projected volume
  • Prompt injection success rate: <1% on an adversarial eval set
  • Unsupported factual answers must refuse rather than guess
  • No leakage of hidden prompts, credentials, or sensitive customer data

Available Resources

  • A hosted LLM API with tool calling and structured outputs
  • A hybrid retrieval stack over internal documents and user-uploaded files
  • 5,000 historical queries plus security-team adversarial examples
  • Capacity for 200 manually reviewed eval examples per month

Question

How would you design and defend this LLM application against prompt injection attacks while still keeping it useful, fast, and affordable? Explain the system design you would choose, how you would evaluate it before launch, and how you would detect and mitigate failures in production.

Your answer
Try one AI text evaluation on us
Get structured feedback, scored against a 4-axis rubric. Premium unlocks unlimited.
0 wordstarget ~200
Up next
Improve RAG Answer QualityHardMonitor RAG Assistant Failure ModesHardGrounded Support Assistant Prompt DesignMedium
Next question