You are supporting a client data integration and the failures do not happen on every run. Some syncs complete normally, while others partially load, time out, or produce inconsistent records. You need a structured way to isolate whether the issue is in orchestration, payload quality, retries, or the integration tooling itself.
What steps would you take to troubleshoot a client integration that is failing intermittently?