Kubernetes Data Platform Architecture Basics

Context

NovaRetail runs its batch ETL platform on Kubernetes and wants junior data engineers to understand the core cluster components before they operate Airflow, Spark, and dbt workloads in production. The current issue is frequent confusion during incident response: engineers can deploy pipelines, but cannot explain how the Kubernetes control plane, worker nodes, Kubelet, and etcd affect scheduling, job execution, and recovery.

You are asked to explain and lightly design the Kubernetes architecture that supports a containerized data platform.

Scale Requirements

Cluster size: 1 control plane, 12 worker nodes
Workloads: 800 Airflow task pods/day, 120 Spark driver/executor pods/day, 40 dbt job pods/day
Latency target: New batch job pod scheduled within 30 seconds under normal load
Storage: etcd metadata under 50 GB; pipeline logs retained 30 days in object storage
Availability: 99.9% scheduler/API availability during business hours

Requirements

Describe the role of the control plane in managing cluster state, scheduling, and API access for data workloads.
Explain what worker nodes do and how they run ETL containers such as Airflow workers or Spark executors.
Explain the Kubelet lifecycle responsibilities on each node, including pod registration, health reporting, and container restart behavior.
Describe how etcd stores cluster state and why it is critical for orchestration reliability.
Walk through what happens when an Airflow DAG launches a KubernetesPodOperator task from API request to running pod.
Identify at least three failure scenarios and how they affect pipeline execution and recovery.
Propose basic monitoring and operational checks for these components.

Constraints

AWS-based environment using Amazon EKS
Small platform team: 3 data engineers, 1 DevOps engineer
No custom Kubernetes operators beyond standard Airflow and Spark-on-K8s deployment patterns
Must support auditability for job execution and infrastructure changes

Context

You are asked to explain and lightly design the Kubernetes architecture that supports a containerized data platform.

Scale Requirements

Cluster size: 1 control plane, 12 worker nodes
Workloads: 800 Airflow task pods/day, 120 Spark driver/executor pods/day, 40 dbt job pods/day
Latency target: New batch job pod scheduled within 30 seconds under normal load
Storage: etcd metadata under 50 GB; pipeline logs retained 30 days in object storage
Availability: 99.9% scheduler/API availability during business hours

Requirements

Describe the role of the control plane in managing cluster state, scheduling, and API access for data workloads.
Explain what worker nodes do and how they run ETL containers such as Airflow workers or Spark executors.
Explain the Kubelet lifecycle responsibilities on each node, including pod registration, health reporting, and container restart behavior.
Describe how etcd stores cluster state and why it is critical for orchestration reliability.
Walk through what happens when an Airflow DAG launches a KubernetesPodOperator task from API request to running pod.
Identify at least three failure scenarios and how they affect pipeline execution and recovery.
Propose basic monitoring and operational checks for these components.

Constraints

AWS-based environment using Amazon EKS
Small platform team: 3 data engineers, 1 DevOps engineer
No custom Kubernetes operators beyond standard Airflow and Spark-on-K8s deployment patterns
Must support auditability for job execution and infrastructure changes

Context

You are asked to explain and lightly design the Kubernetes architecture that supports a containerized data platform.

Scale Requirements

Cluster size: 1 control plane, 12 worker nodes
Workloads: 800 Airflow task pods/day, 120 Spark driver/executor pods/day, 40 dbt job pods/day
Latency target: New batch job pod scheduled within 30 seconds under normal load
Storage: etcd metadata under 50 GB; pipeline logs retained 30 days in object storage
Availability: 99.9% scheduler/API availability during business hours

Requirements

Describe the role of the control plane in managing cluster state, scheduling, and API access for data workloads.
Explain what worker nodes do and how they run ETL containers such as Airflow workers or Spark executors.
Explain the Kubelet lifecycle responsibilities on each node, including pod registration, health reporting, and container restart behavior.
Describe how etcd stores cluster state and why it is critical for orchestration reliability.
Walk through what happens when an Airflow DAG launches a KubernetesPodOperator task from API request to running pod.
Identify at least three failure scenarios and how they affect pipeline execution and recovery.
Propose basic monitoring and operational checks for these components.

Constraints

AWS-based environment using Amazon EKS
Small platform team: 3 data engineers, 1 DevOps engineer
No custom Kubernetes operators beyond standard Airflow and Spark-on-K8s deployment patterns
Must support auditability for job execution and infrastructure changes

Context

You are asked to explain and lightly design the Kubernetes architecture that supports a containerized data platform.

Scale Requirements

Cluster size: 1 control plane, 12 worker nodes
Workloads: 800 Airflow task pods/day, 120 Spark driver/executor pods/day, 40 dbt job pods/day
Latency target: New batch job pod scheduled within 30 seconds under normal load
Storage: etcd metadata under 50 GB; pipeline logs retained 30 days in object storage
Availability: 99.9% scheduler/API availability during business hours

Requirements

Describe the role of the control plane in managing cluster state, scheduling, and API access for data workloads.
Explain what worker nodes do and how they run ETL containers such as Airflow workers or Spark executors.
Explain the Kubelet lifecycle responsibilities on each node, including pod registration, health reporting, and container restart behavior.
Describe how etcd stores cluster state and why it is critical for orchestration reliability.
Walk through what happens when an Airflow DAG launches a KubernetesPodOperator task from API request to running pod.
Identify at least three failure scenarios and how they affect pipeline execution and recovery.
Propose basic monitoring and operational checks for these components.

Constraints

AWS-based environment using Amazon EKS
Small platform team: 3 data engineers, 1 DevOps engineer
No custom Kubernetes operators beyond standard Airflow and Spark-on-K8s deployment patterns
Must support auditability for job execution and infrastructure changes

Interview Guides

Context

Scale Requirements

Requirements

Constraints

Kubernetes Data Platform Architecture Basics

Context

Scale Requirements

Requirements

Constraints

Your Answer

Kubernetes Data Platform Architecture Basics

Context

Scale Requirements

Requirements

Constraints

Kubernetes Data Platform Architecture Basics

Context

Scale Requirements

Requirements

Constraints

Your Answer