You are building an LLM agent that can run shell commands to help with engineering tasks such as reading logs, inspecting files, and running safe diagnostics. The agent is useful, but shell access creates obvious risk: prompt injection from tool output, destructive commands, secret exposure, and confident but unsafe reasoning about what to execute.
Walk me through defense in depth for an agent with shell access. What layers would you put in place across prompting, tool design, execution sandboxing, permissions, monitoring, and evaluation to reduce prompt injection, hallucinated actions, and harmful command execution?
Agent design for high-risk toolsPrompt injection defensesHallucination containment for actionsEval-first thinking for safety-critical systems