Choose Kubernetes Workload for Pipelines

Context

NovaRetail runs Airflow, Spark batch jobs, and a small fleet of Kafka consumers on Amazon EKS. The platform team currently deploys all pipeline components as generic Kubernetes Deployments, which has caused issues with stable identities for stateful workers, duplicate processing during restarts, and inconsistent node-level log collection.

You need to explain when to use Deployments, StatefulSets, and DaemonSets in this data platform, and propose how each should be applied to pipeline workloads.

Scale Requirements

Cluster size: 60 EKS worker nodes across 3 AZs
Airflow workloads: 2 schedulers, 20-200 ephemeral task pods/day
Streaming consumers: 48 Kafka partitions, target consumer lag < 30 seconds
Stateful services: 3 metadata/cache replicas with persistent volumes
Node-level agents: 1 pod per node for logs and metrics
Availability target: 99.9% for orchestration and ingestion services

Requirements

Describe the operational differences between Deployments, StatefulSets, and DaemonSets in Kubernetes.
Map each controller to concrete data engineering workloads such as Airflow webserver/scheduler, Kafka consumers, metadata databases, and node-level observability agents.
Explain implications for scaling, pod identity, storage, rolling updates, and failure recovery.
Show how you would deploy at least one stateless service and one node-level agent on EKS.
Define monitoring and alerting for rollout failures, pod churn, unavailable replicas, and node coverage.
Discuss trade-offs if the team wants to minimize operational complexity while preserving reliability for ETL and streaming jobs.

Constraints

AWS-first stack: EKS, EBS, CloudWatch, Prometheus, Grafana
Small platform team: 3 engineers supporting all data infrastructure
Budget-sensitive: avoid overprovisioning dedicated nodes
Compliance: production logs must be collected from every node and retained for 30 days
Existing workloads cannot tolerate more than 5 minutes of orchestration downtime

Context

You need to explain when to use Deployments, StatefulSets, and DaemonSets in this data platform, and propose how each should be applied to pipeline workloads.

Scale Requirements

Cluster size: 60 EKS worker nodes across 3 AZs
Airflow workloads: 2 schedulers, 20-200 ephemeral task pods/day
Streaming consumers: 48 Kafka partitions, target consumer lag < 30 seconds
Stateful services: 3 metadata/cache replicas with persistent volumes
Node-level agents: 1 pod per node for logs and metrics
Availability target: 99.9% for orchestration and ingestion services

Requirements

Describe the operational differences between Deployments, StatefulSets, and DaemonSets in Kubernetes.
Map each controller to concrete data engineering workloads such as Airflow webserver/scheduler, Kafka consumers, metadata databases, and node-level observability agents.
Explain implications for scaling, pod identity, storage, rolling updates, and failure recovery.
Show how you would deploy at least one stateless service and one node-level agent on EKS.
Define monitoring and alerting for rollout failures, pod churn, unavailable replicas, and node coverage.
Discuss trade-offs if the team wants to minimize operational complexity while preserving reliability for ETL and streaming jobs.

Constraints

AWS-first stack: EKS, EBS, CloudWatch, Prometheus, Grafana
Small platform team: 3 engineers supporting all data infrastructure
Budget-sensitive: avoid overprovisioning dedicated nodes
Compliance: production logs must be collected from every node and retained for 30 days
Existing workloads cannot tolerate more than 5 minutes of orchestration downtime

Context

You need to explain when to use Deployments, StatefulSets, and DaemonSets in this data platform, and propose how each should be applied to pipeline workloads.

Scale Requirements

Cluster size: 60 EKS worker nodes across 3 AZs
Airflow workloads: 2 schedulers, 20-200 ephemeral task pods/day
Streaming consumers: 48 Kafka partitions, target consumer lag < 30 seconds
Stateful services: 3 metadata/cache replicas with persistent volumes
Node-level agents: 1 pod per node for logs and metrics
Availability target: 99.9% for orchestration and ingestion services

Requirements

Describe the operational differences between Deployments, StatefulSets, and DaemonSets in Kubernetes.
Map each controller to concrete data engineering workloads such as Airflow webserver/scheduler, Kafka consumers, metadata databases, and node-level observability agents.
Explain implications for scaling, pod identity, storage, rolling updates, and failure recovery.
Show how you would deploy at least one stateless service and one node-level agent on EKS.
Define monitoring and alerting for rollout failures, pod churn, unavailable replicas, and node coverage.
Discuss trade-offs if the team wants to minimize operational complexity while preserving reliability for ETL and streaming jobs.

Constraints

AWS-first stack: EKS, EBS, CloudWatch, Prometheus, Grafana
Small platform team: 3 engineers supporting all data infrastructure
Budget-sensitive: avoid overprovisioning dedicated nodes
Compliance: production logs must be collected from every node and retained for 30 days
Existing workloads cannot tolerate more than 5 minutes of orchestration downtime

Context

You need to explain when to use Deployments, StatefulSets, and DaemonSets in this data platform, and propose how each should be applied to pipeline workloads.

Scale Requirements

Cluster size: 60 EKS worker nodes across 3 AZs
Airflow workloads: 2 schedulers, 20-200 ephemeral task pods/day
Streaming consumers: 48 Kafka partitions, target consumer lag < 30 seconds
Stateful services: 3 metadata/cache replicas with persistent volumes
Node-level agents: 1 pod per node for logs and metrics
Availability target: 99.9% for orchestration and ingestion services

Requirements

Describe the operational differences between Deployments, StatefulSets, and DaemonSets in Kubernetes.
Map each controller to concrete data engineering workloads such as Airflow webserver/scheduler, Kafka consumers, metadata databases, and node-level observability agents.
Explain implications for scaling, pod identity, storage, rolling updates, and failure recovery.
Show how you would deploy at least one stateless service and one node-level agent on EKS.
Define monitoring and alerting for rollout failures, pod churn, unavailable replicas, and node coverage.
Discuss trade-offs if the team wants to minimize operational complexity while preserving reliability for ETL and streaming jobs.

Constraints

AWS-first stack: EKS, EBS, CloudWatch, Prometheus, Grafana
Small platform team: 3 engineers supporting all data infrastructure
Budget-sensitive: avoid overprovisioning dedicated nodes
Compliance: production logs must be collected from every node and retained for 30 days
Existing workloads cannot tolerate more than 5 minutes of orchestration downtime

Interview Guides

Context

Scale Requirements

Requirements

Constraints

Choose Kubernetes Workload for Pipelines

Context

Scale Requirements

Requirements

Constraints

Your Answer

Choose Kubernetes Workload for Pipelines

Context

Scale Requirements

Requirements

Constraints

Choose Kubernetes Workload for Pipelines

Context

Scale Requirements

Requirements

Constraints

Your Answer