Context
FinEdge, a mid-size fintech company, runs 40 batch ETL and ELT jobs that move data from PostgreSQL, S3, and third-party APIs into Snowflake. Today, Jenkins is used only for basic application CI, while data jobs are deployed manually, causing inconsistent releases, missed dependency checks, and failed backfills.
You are asked to design a Jenkins-based pipeline framework for data engineering workloads that standardizes build, test, deploy, and scheduled execution for Python ETL jobs, dbt transformations, and Airflow DAG releases.
Scale Requirements
- Pipelines: 120 Jenkins pipelines across dev, staging, and prod
- Deploy frequency: 30-50 data releases per day
- Batch jobs: 40 scheduled jobs, 10 backfills/week
- Latency: CI validation < 10 minutes; production deployment < 15 minutes
- Artifacts: ~300 MB Docker image per ETL service; 2 TB/day processed downstream
- Reliability target: 99.5% successful scheduled runs per month
Requirements
- Design a Jenkins pipeline template for Python ETL, dbt projects, and Airflow DAG deployment.
- Include stages for code checkout, unit tests, data quality tests, packaging, container build, and environment promotion.
- Support parameterized runs for backfills using date ranges while preventing duplicate loads.
- Enforce dependency ordering so upstream ingestion jobs complete before downstream dbt or Airflow-triggered jobs run.
- Implement secrets management, role-based access, and auditability for production releases.
- Define rollback and re-run behavior for failed deployments and failed scheduled executions.
- Describe how Jenkins integrates with GitHub, Docker, Kubernetes, Snowflake, and Airflow.
Constraints
- Existing CI/CD standard must remain Jenkins; no migration to GitHub Actions or GitLab CI.
- Infrastructure is AWS-based with EKS, S3, and Snowflake already provisioned.
- Team has 3 data engineers and 1 platform engineer, so operational overhead must stay low.
- Compliance requires change history, approval gates for prod, and secret rotation every 90 days.