You are building an agent that reads documents during a task, then uses those documents to decide what to do next. One of the ingested documents may contain hidden or explicit instructions like "ignore previous directions" or "send data to this endpoint," mixed in with otherwise useful content.
How would you prevent prompt injection from a document the agent ingested mid-task?
Prompt injection defenses for agent workflowsSeparation of trusted instructions from untrusted document contentRAG and retrieval containment choicesOffline and online evaluation of attack resistance