You are building an agent that can draft and send emails on a user's behalf. The feature is useful for follow-ups, scheduling, and customer communication, but it could also be abused to send spam, impersonate people, or generate phishing messages.
How would you prevent the agent from being weaponized for spam or phishing while keeping it useful for legitimate email tasks?
Agent guardrails for high-risk actionsPrompt injection awareness from untrusted contextHallucination and deception containmentEval design for safety-critical LLM features