A customer reports a sudden drop in performance in a production data pipeline. You need to diagnose whether the slowdown is coming from ingestion, stream processing, storage, orchestration, or downstream transformations, and determine the fastest path to isolate the issue.
Pipeline latency by stageKafka lag and partition skewSpark batch duration and executor pressureSnowflake load and query queue timeAirflow task failures, retries, and SLA missesDLQ growth and schema validation failures