Terraform for Data Platform Pipelines

Context

Northbeam Analytics runs batch and near-real-time data pipelines on AWS using Airflow, EMR Serverless, S3, and Snowflake. Today, infrastructure for DAGs, IAM roles, S3 buckets, and compute environments is created manually across dev, staging, and prod, causing drift, failed deployments, and inconsistent access controls.

You need to design how the team should manage pipeline infrastructure as code with Terraform so new data pipelines can be provisioned reproducibly, reviewed through Git, and promoted safely across environments.

Scale Requirements

Environments: 3 isolated environments (dev, staging, prod)
Pipelines: 120 Airflow DAGs, 35 batch Spark jobs, 8 streaming jobs
Deployments: 20-30 Terraform applies per week
Storage: 1.5 PB in S3 across raw, staging, and curated zones
Latency target: Infrastructure changes promoted to prod within 30 minutes after approval
Team size: 10 data engineers, 2 platform engineers

Requirements

Define a Terraform structure for reusable modules covering S3, IAM, Airflow connections, EMR Serverless applications, and Snowflake objects.
Support environment-specific configuration without duplicating code.
Design CI/CD for terraform fmt, validate, plan, policy checks, and controlled apply.
Manage remote state, state locking, and secret handling securely.
Prevent destructive changes to production data stores and shared pipeline resources.
Include strategies for drift detection, module versioning, and rollback.
Explain how Terraform changes integrate with pipeline orchestration and deployment workflows.

Constraints

AWS is the primary cloud; Terraform Cloud is not approved.
Secrets must remain in AWS Secrets Manager and cannot be stored in state in plaintext.
Production changes require approval and audit logs.
Monthly platform tooling budget is capped at $8K.
SOC 2 controls require least-privilege IAM and change traceability.

Context

Scale Requirements

Environments: 3 isolated environments (dev, staging, prod)
Pipelines: 120 Airflow DAGs, 35 batch Spark jobs, 8 streaming jobs
Deployments: 20-30 Terraform applies per week
Storage: 1.5 PB in S3 across raw, staging, and curated zones
Latency target: Infrastructure changes promoted to prod within 30 minutes after approval
Team size: 10 data engineers, 2 platform engineers

Requirements

Define a Terraform structure for reusable modules covering S3, IAM, Airflow connections, EMR Serverless applications, and Snowflake objects.
Support environment-specific configuration without duplicating code.
Design CI/CD for terraform fmt, validate, plan, policy checks, and controlled apply.
Manage remote state, state locking, and secret handling securely.
Prevent destructive changes to production data stores and shared pipeline resources.
Include strategies for drift detection, module versioning, and rollback.
Explain how Terraform changes integrate with pipeline orchestration and deployment workflows.

Constraints

AWS is the primary cloud; Terraform Cloud is not approved.
Secrets must remain in AWS Secrets Manager and cannot be stored in state in plaintext.
Production changes require approval and audit logs.
Monthly platform tooling budget is capped at $8K.
SOC 2 controls require least-privilege IAM and change traceability.

Context

Scale Requirements

Environments: 3 isolated environments (dev, staging, prod)
Pipelines: 120 Airflow DAGs, 35 batch Spark jobs, 8 streaming jobs
Deployments: 20-30 Terraform applies per week
Storage: 1.5 PB in S3 across raw, staging, and curated zones
Latency target: Infrastructure changes promoted to prod within 30 minutes after approval
Team size: 10 data engineers, 2 platform engineers

Requirements

Define a Terraform structure for reusable modules covering S3, IAM, Airflow connections, EMR Serverless applications, and Snowflake objects.
Support environment-specific configuration without duplicating code.
Design CI/CD for terraform fmt, validate, plan, policy checks, and controlled apply.
Manage remote state, state locking, and secret handling securely.
Prevent destructive changes to production data stores and shared pipeline resources.
Include strategies for drift detection, module versioning, and rollback.
Explain how Terraform changes integrate with pipeline orchestration and deployment workflows.

Constraints

AWS is the primary cloud; Terraform Cloud is not approved.
Secrets must remain in AWS Secrets Manager and cannot be stored in state in plaintext.
Production changes require approval and audit logs.
Monthly platform tooling budget is capped at $8K.
SOC 2 controls require least-privilege IAM and change traceability.

Context

Scale Requirements

Environments: 3 isolated environments (dev, staging, prod)
Pipelines: 120 Airflow DAGs, 35 batch Spark jobs, 8 streaming jobs
Deployments: 20-30 Terraform applies per week
Storage: 1.5 PB in S3 across raw, staging, and curated zones
Latency target: Infrastructure changes promoted to prod within 30 minutes after approval
Team size: 10 data engineers, 2 platform engineers

Requirements

Define a Terraform structure for reusable modules covering S3, IAM, Airflow connections, EMR Serverless applications, and Snowflake objects.
Support environment-specific configuration without duplicating code.
Design CI/CD for terraform fmt, validate, plan, policy checks, and controlled apply.
Manage remote state, state locking, and secret handling securely.
Prevent destructive changes to production data stores and shared pipeline resources.
Include strategies for drift detection, module versioning, and rollback.
Explain how Terraform changes integrate with pipeline orchestration and deployment workflows.

Constraints

AWS is the primary cloud; Terraform Cloud is not approved.
Secrets must remain in AWS Secrets Manager and cannot be stored in state in plaintext.
Production changes require approval and audit logs.
Monthly platform tooling budget is capped at $8K.
SOC 2 controls require least-privilege IAM and change traceability.

Interview Guides

Context

Scale Requirements

Requirements

Constraints

Terraform for Data Platform Pipelines

Context

Scale Requirements

Requirements

Constraints

Your Answer

Terraform for Data Platform Pipelines

Context

Scale Requirements

Requirements

Constraints

Terraform for Data Platform Pipelines

Context

Scale Requirements

Requirements

Constraints

Your Answer