Context
AcmeHealth, a B2B healthcare analytics company, runs nightly ETL pipelines that ingest client SFTP files, validate them, transform them into Snowflake marts, and publish QA-approved datasets to client-facing dashboards. Today, dependencies across engineering, QA, and client teams are tracked manually in spreadsheets and Slack, causing missed handoffs, delayed releases, and unclear ownership when upstream files or validation sign-offs are late.
You need to design a dependency-aware pipeline orchestration process that makes technical and human dependencies explicit, blocks downstream execution when prerequisites are unmet, and provides clear visibility into status, SLA risk, and failure recovery.
Scale Requirements
- Clients: 180 enterprise clients
- Inbound feeds: 1,200 daily files across SFTP and API pulls
- Daily volume: 2.5 TB raw CSV/JSON data
- Pipeline runs: ~8,000 Airflow task instances/day
- Latency target: client dashboards updated by 6:00 AM local client time
- QA throughput: 300 validation suites/night across staging and production
- Retention: 1 year raw, 3 years curated warehouse tables
Requirements
- Model dependencies across three groups: engineering-owned ingestion/transforms, QA-owned validation/sign-off, and client-owned file delivery/SLA commitments.
- Design orchestration that supports both automated dependencies (task completion, data quality checks) and manual gates (QA approval, client exception acknowledgment).
- Prevent downstream loads when upstream files are missing, schema checks fail, or QA approval is incomplete.
- Support idempotent reruns, backfills for missed client deliveries, and per-client dependency overrides.
- Provide status dashboards showing blocked tasks, dependency owners, expected unblock times, and SLA breach risk.
- Define monitoring, alerting, and escalation paths for late files, failed validations, and stuck approvals.
Constraints
- Existing stack is AWS + Snowflake; avoid introducing more than one major new platform.
- Team has 3 data engineers, 2 QA analysts, and limited on-call coverage overnight.
- Must support HIPAA-aligned auditability for approvals and data release events.
- Incremental budget cap: $15K/month.