Context
A Databricks customer runs a lakehouse ETL platform that ingests Salesforce, NetSuite, application logs, and CDC feeds into Delta Lake, then publishes curated marts for finance and product analytics. Today, orchestration is split across cron jobs and ad hoc notebooks; the platform team wants to standardize on a workflow orchestrator and is evaluating Apache Airflow, Dagster, and Prefect, while also considering how much can be replaced by Databricks Workflows, Delta Live Tables, and Lakeflow Connect.
Your task is to design the target orchestration approach and compare these tools in the context of a Databricks-first stack.
Scale Requirements
- Pipelines: 320 scheduled pipelines, 40 event-triggered pipelines
- Tasks/day: ~28,000 task runs/day across dev, staging, prod
- Data volume: 14 TB/day ingested, 2.1 PB retained in Delta Lake
- Latency targets: batch SLA < 30 minutes for bronze-to-silver; critical CDC pipelines < 5 minutes
- Concurrency: up to 180 parallel task executions during peak windows
- Reliability: 99.9% successful scheduled runs per month
Requirements
- Compare Airflow, Dagster, and Prefect for orchestrating Databricks Jobs, Delta Live Tables, dbt on Databricks, and external dependencies.
- Propose a target architecture that supports batch ETL, limited streaming dependencies, backfills, retries, and idempotent reruns.
- Explain how you would model lineage, asset dependencies, and environment promotion across dev/staging/prod.
- Define how secrets, RBAC, CI/CD, and multi-team ownership would work in Databricks.
- Specify monitoring for SLA misses, failed runs, data quality regressions, and cost anomalies.
- Discuss when native Databricks orchestration should be preferred over an external orchestrator.
Constraints
- Primary compute and storage must remain on Databricks Lakehouse.
- Team size: 6 data engineers, 2 analytics engineers, 1 platform engineer.
- Minimal operational overhead is preferred over maximum flexibility.
- SOX-sensitive finance pipelines require auditable deployments and run history.
- Budget allows one orchestration platform, not multiple overlapping control planes.