You are building a document-grounded assistant for an internal operations team that answers questions over policy manuals, customer communications, and uploaded files. The assistant is already useful, but security review found that users can paste adversarial text or upload documents containing instructions like “ignore prior rules” and “reveal hidden prompts.” The product is expected to handle thousands of daily queries, and some answers may affect financial workflows, so unsafe behavior is a launch blocker.
How would you design and defend this LLM application against prompt injection attacks while still keeping it useful, fast, and affordable? Explain the system design you would choose, how you would evaluate it before launch, and how you would detect and mitigate failures in production.