Choose Kafka vs Flink

Context

ShopWave, a retail marketplace, currently ingests order, inventory, and clickstream events through REST APIs into S3 and runs hourly Spark batch jobs for downstream analytics. The platform now needs sub-minute freshness for fraud detection and inventory updates, and the data team must decide whether to standardize on Apache Kafka, Apache Flink, or both as the core of the new streaming architecture.

Your task is to design the target pipeline and explain how you would choose between Kafka and Flink for this use case rather than treating them as interchangeable tools.

Scale Requirements

Sources: web/mobile clickstream, order service CDC, inventory service events, payment webhooks
Peak throughput: 180K events/sec, average 45K events/sec
Event size: 1-4 KB JSON/Avro
Latency target: fraud signals < 10 seconds, analytics aggregates < 2 minutes
Retention: raw events for 30 days, curated warehouse tables for 2 years
Availability target: 99.9% for ingestion and processing

Requirements

Design an end-to-end streaming pipeline from producers to serving layers.
Explain which responsibilities belong to Kafka vs Flink: ingestion, buffering, replay, stateful processing, windowing, joins, and delivery.
Support exactly-once or effectively-once processing for order and payment events.
Implement schema validation, deduplication, and dead-letter handling for malformed events.
Load curated outputs into Snowflake for analytics and Redis for low-latency fraud lookups.
Describe how the system handles backfills, late-arriving events, and schema evolution.
Define monitoring, alerting, and recovery procedures.

Constraints

AWS-first environment; managed services preferred where practical
Team has prior Kafka experience but limited Flink expertise
Incremental budget cap: $35K/month
PCI-related payment events require encryption in transit and at rest
Existing batch S3 lake must remain available as fallback during migration

Context

Your task is to design the target pipeline and explain how you would choose between Kafka and Flink for this use case rather than treating them as interchangeable tools.

Scale Requirements

Sources: web/mobile clickstream, order service CDC, inventory service events, payment webhooks
Peak throughput: 180K events/sec, average 45K events/sec
Event size: 1-4 KB JSON/Avro
Latency target: fraud signals < 10 seconds, analytics aggregates < 2 minutes
Retention: raw events for 30 days, curated warehouse tables for 2 years
Availability target: 99.9% for ingestion and processing

Requirements

Design an end-to-end streaming pipeline from producers to serving layers.
Explain which responsibilities belong to Kafka vs Flink: ingestion, buffering, replay, stateful processing, windowing, joins, and delivery.
Support exactly-once or effectively-once processing for order and payment events.
Implement schema validation, deduplication, and dead-letter handling for malformed events.
Load curated outputs into Snowflake for analytics and Redis for low-latency fraud lookups.
Describe how the system handles backfills, late-arriving events, and schema evolution.
Define monitoring, alerting, and recovery procedures.

Constraints

AWS-first environment; managed services preferred where practical
Team has prior Kafka experience but limited Flink expertise
Incremental budget cap: $35K/month
PCI-related payment events require encryption in transit and at rest
Existing batch S3 lake must remain available as fallback during migration

Context

Your task is to design the target pipeline and explain how you would choose between Kafka and Flink for this use case rather than treating them as interchangeable tools.

Scale Requirements

Sources: web/mobile clickstream, order service CDC, inventory service events, payment webhooks
Peak throughput: 180K events/sec, average 45K events/sec
Event size: 1-4 KB JSON/Avro
Latency target: fraud signals < 10 seconds, analytics aggregates < 2 minutes
Retention: raw events for 30 days, curated warehouse tables for 2 years
Availability target: 99.9% for ingestion and processing

Requirements

Design an end-to-end streaming pipeline from producers to serving layers.
Explain which responsibilities belong to Kafka vs Flink: ingestion, buffering, replay, stateful processing, windowing, joins, and delivery.
Support exactly-once or effectively-once processing for order and payment events.
Implement schema validation, deduplication, and dead-letter handling for malformed events.
Load curated outputs into Snowflake for analytics and Redis for low-latency fraud lookups.
Describe how the system handles backfills, late-arriving events, and schema evolution.
Define monitoring, alerting, and recovery procedures.

Constraints

AWS-first environment; managed services preferred where practical
Team has prior Kafka experience but limited Flink expertise
Incremental budget cap: $35K/month
PCI-related payment events require encryption in transit and at rest
Existing batch S3 lake must remain available as fallback during migration

Context

Your task is to design the target pipeline and explain how you would choose between Kafka and Flink for this use case rather than treating them as interchangeable tools.

Scale Requirements

Sources: web/mobile clickstream, order service CDC, inventory service events, payment webhooks
Peak throughput: 180K events/sec, average 45K events/sec
Event size: 1-4 KB JSON/Avro
Latency target: fraud signals < 10 seconds, analytics aggregates < 2 minutes
Retention: raw events for 30 days, curated warehouse tables for 2 years
Availability target: 99.9% for ingestion and processing

Requirements

Design an end-to-end streaming pipeline from producers to serving layers.
Explain which responsibilities belong to Kafka vs Flink: ingestion, buffering, replay, stateful processing, windowing, joins, and delivery.
Support exactly-once or effectively-once processing for order and payment events.
Implement schema validation, deduplication, and dead-letter handling for malformed events.
Load curated outputs into Snowflake for analytics and Redis for low-latency fraud lookups.
Describe how the system handles backfills, late-arriving events, and schema evolution.
Define monitoring, alerting, and recovery procedures.

Constraints

AWS-first environment; managed services preferred where practical
Team has prior Kafka experience but limited Flink expertise
Incremental budget cap: $35K/month
PCI-related payment events require encryption in transit and at rest
Existing batch S3 lake must remain available as fallback during migration

Interview Guides

Context

Scale Requirements

Requirements

Constraints

Choose Kafka vs Flink

Context

Scale Requirements

Requirements

Constraints

Your Answer

Choose Kafka vs Flink

Context

Scale Requirements

Requirements

Constraints

Choose Kafka vs Flink

Context

Scale Requirements

Requirements

Constraints

Your Answer