

You are working on a customer service chatbot that answers account, billing, and policy questions. Because it interacts directly with users, it will face attempts to override instructions, extract hidden prompts, or get unsafe answers through jailbreak-style prompts. You need a strategy that reduces these attacks without making the bot too brittle for normal users.
What strategies would you implement to protect a customer service chatbot from prompt injection and jailbreaking attempts?