You are building an LLM agent that can answer user requests by calling tools. In testing, the agent keeps choosing the wrong tool, passing malformed arguments, or calling a tool when it should answer directly. You need a disciplined way to debug the behavior before shipping.
How do you debug a tool the agent keeps calling incorrectly?
Tool choice vs expected toolArgument schema violationsCases that should have asked a clarifying questionWhether the tool output itself is causing confusion or injection