Context
FinEdge, a mid-size fintech company, runs 25 batch ETL pipelines that move transaction, customer, and ledger data from PostgreSQL and S3 into Snowflake. Today, deployments are manual: engineers merge code, run ad hoc tests, update Airflow DAGs by hand, and promote changes directly to production, causing failed runs, inconsistent environments, and rollback delays.
You need to design an automated deployment process for the data platform so pipeline code, SQL transformations, and orchestration changes can be promoted safely from development to production.
Scale Requirements
- Pipelines: 25 Airflow DAGs, growing to 80 within 12 months
- Deploy frequency: 15-20 releases/week across DAGs, dbt models, and shared libraries
- Data volume: ~4 TB/day processed in batch windows
- Deployment SLA: production deployment in < 15 minutes after approval
- Rollback target: restore previous working version in < 10 minutes
- Availability: no missed scheduled runs during deployment windows
Requirements
- Design a CI/CD process for ETL code, dbt models, and Airflow DAGs using Git-based workflows.
- Validate deployments with automated unit tests, SQL/data quality checks, and DAG parsing before promotion.
- Support environment isolation across dev, staging, and prod with configuration managed separately from code.
- Ensure deployments are idempotent and safe for reruns; avoid duplicate loads or partially applied schema changes.
- Include a strategy for database migrations, backfills, and rollback of failed releases.
- Define monitoring and alerting for deployment failures, DAG import errors, and post-deploy data quality regressions.
- Explain how secrets, credentials, and access control are handled.
Constraints
- Existing stack is AWS-based and must remain there.
- Team size is 3 data engineers and 1 platform engineer; solution should minimize operational overhead.
- Monthly incremental tooling budget is capped at $8K.
- Financial data requires auditability of every deployment and separation of duties for production approval.