You are supporting a client data integration and the failures do not happen on every run. Some syncs complete normally, while others partially load, time out, or produce inconsistent records. You need a structured way to isolate whether the issue is in orchestration, payload quality, retries, or the integration tooling itself.
What steps would you take to troubleshoot a client integration that is failing intermittently?
Run history and retry patterns in Apache AirflowSource API status codes, timeouts, and rate limitsSchema drift, null spikes, and duplicate business keysSnowflake load errors and partial file ingestionCorrelation IDs across Avetta connector logs and downstream tables