Context
FinSight, a fintech analytics company, runs 120 production data pipelines across batch and streaming workloads. The current deployment process is manual: engineers merge code to GitHub, trigger Airflow DAG updates by hand, and apply dbt changes directly in production, causing frequent schema drift, broken dependencies, and rollback delays.
You need to design a CI/CD system for the companys data platform that supports safe, automated deployment of pipeline code, SQL transformations, and infrastructure changes. The platform is AWS-based and includes Apache Airflow, dbt, Kafka, Spark, and Snowflake.
Scale Requirements
- Pipelines: 120 production DAGs, 40 dbt models, 15 Spark jobs, 8 Kafka topics
- Deployments: ~30 code changes/day across 12 engineers
- Data volume: 6 TB/day batch, 80K events/sec streaming peak
- Latency target: CI validation < 15 minutes; production deployment < 10 minutes
- Availability target: 99.9% for orchestration and transformation layers
- Rollback target: Restore previous stable version within 5 minutes
Requirements
- Design a CI/CD workflow for pipeline code, SQL models, and infrastructure-as-code.
- Include automated testing for DAG validity, dbt model correctness, Spark job packaging, and schema compatibility.
- Support environment promotion across dev staging prod with approval gates.
- Ensure deployments are idempotent and safe for backfills, retries, and partial failures.
- Define how secrets, configuration, and environment-specific variables are managed.
- Include monitoring for deployment failures, data quality regressions, and post-release pipeline health.
- Provide a rollback strategy for failed Airflow, dbt, or streaming deployments.
Constraints
- Must use GitHub as the source control system.
- AWS is the only approved cloud; Terraform is already used for infrastructure.
- Production changes require auditability for SOX compliance.
- Team budget allows managed services, but no full platform rewrite.
- Streaming jobs cannot tolerate more than 2 minutes of deployment interruption.