Design AWS Snowflake Realtime Analytics

Context

PulseCart, a global retail marketplace, currently lands application, order, and inventory events into Amazon S3 and runs hourly Spark batch jobs before loading Snowflake. Product and operations teams now need sub-2-minute freshness for fraud, inventory, and conversion dashboards, so the platform must support real-time analytics without breaking existing batch consumers.

You are asked to design a scalable AWS + Snowflake architecture that ingests event streams and database changes, validates and transforms data, and serves analytics-ready tables with strong observability and recovery.

Scale Requirements

Throughput: 250K events/sec peak, 60K avg
Sources: web/mobile clickstream, checkout service events, CDC from Aurora PostgreSQL
Event size: 1-3 KB JSON; CDC payloads up to 10 KB
Latency target: source to Snowflake queryable in < 2 minutes P95
Daily volume: ~12 TB raw compressed
Retention: 180 days raw in S3, 3 years curated in Snowflake

Requirements

Design ingestion for both append-only events and CDC updates/deletes.
Support schema validation, deduplication, and replay without double-counting.
Build a raw-to-curated ELT pattern in Snowflake for near-real-time dashboards.
Preserve event ordering where needed for order and inventory topics.
Define orchestration for streaming jobs, dbt transformations, backfills, and dependency management.
Include monitoring for freshness, throughput, cost, and data quality.
Describe failure recovery for broker outages, bad schema deployments, late data, and Snowflake load failures.

Constraints

Existing cloud footprint is AWS; Snowflake is the analytical warehouse.
Team has strong SQL/dbt skills, moderate Spark experience, limited Kafka operations experience.
Incremental platform budget is $35K/month.
Must support GDPR deletion within 72 hours and maintain auditability for finance data.
Existing hourly batch pipeline must remain available as fallback during migration.

Context

Scale Requirements

Throughput: 250K events/sec peak, 60K avg
Sources: web/mobile clickstream, checkout service events, CDC from Aurora PostgreSQL
Event size: 1-3 KB JSON; CDC payloads up to 10 KB
Latency target: source to Snowflake queryable in < 2 minutes P95
Daily volume: ~12 TB raw compressed
Retention: 180 days raw in S3, 3 years curated in Snowflake

Requirements

Design ingestion for both append-only events and CDC updates/deletes.
Support schema validation, deduplication, and replay without double-counting.
Build a raw-to-curated ELT pattern in Snowflake for near-real-time dashboards.
Preserve event ordering where needed for order and inventory topics.
Define orchestration for streaming jobs, dbt transformations, backfills, and dependency management.
Include monitoring for freshness, throughput, cost, and data quality.
Describe failure recovery for broker outages, bad schema deployments, late data, and Snowflake load failures.

Constraints

Existing cloud footprint is AWS; Snowflake is the analytical warehouse.
Team has strong SQL/dbt skills, moderate Spark experience, limited Kafka operations experience.
Incremental platform budget is $35K/month.
Must support GDPR deletion within 72 hours and maintain auditability for finance data.
Existing hourly batch pipeline must remain available as fallback during migration.

Context

Scale Requirements

Throughput: 250K events/sec peak, 60K avg
Sources: web/mobile clickstream, checkout service events, CDC from Aurora PostgreSQL
Event size: 1-3 KB JSON; CDC payloads up to 10 KB
Latency target: source to Snowflake queryable in < 2 minutes P95
Daily volume: ~12 TB raw compressed
Retention: 180 days raw in S3, 3 years curated in Snowflake

Requirements

Design ingestion for both append-only events and CDC updates/deletes.
Support schema validation, deduplication, and replay without double-counting.
Build a raw-to-curated ELT pattern in Snowflake for near-real-time dashboards.
Preserve event ordering where needed for order and inventory topics.
Define orchestration for streaming jobs, dbt transformations, backfills, and dependency management.
Include monitoring for freshness, throughput, cost, and data quality.
Describe failure recovery for broker outages, bad schema deployments, late data, and Snowflake load failures.

Constraints

Existing cloud footprint is AWS; Snowflake is the analytical warehouse.
Team has strong SQL/dbt skills, moderate Spark experience, limited Kafka operations experience.
Incremental platform budget is $35K/month.
Must support GDPR deletion within 72 hours and maintain auditability for finance data.
Existing hourly batch pipeline must remain available as fallback during migration.

Context

Scale Requirements

Throughput: 250K events/sec peak, 60K avg
Sources: web/mobile clickstream, checkout service events, CDC from Aurora PostgreSQL
Event size: 1-3 KB JSON; CDC payloads up to 10 KB
Latency target: source to Snowflake queryable in < 2 minutes P95
Daily volume: ~12 TB raw compressed
Retention: 180 days raw in S3, 3 years curated in Snowflake

Requirements

Design ingestion for both append-only events and CDC updates/deletes.
Support schema validation, deduplication, and replay without double-counting.
Build a raw-to-curated ELT pattern in Snowflake for near-real-time dashboards.
Preserve event ordering where needed for order and inventory topics.
Define orchestration for streaming jobs, dbt transformations, backfills, and dependency management.
Include monitoring for freshness, throughput, cost, and data quality.
Describe failure recovery for broker outages, bad schema deployments, late data, and Snowflake load failures.

Constraints

Existing cloud footprint is AWS; Snowflake is the analytical warehouse.
Team has strong SQL/dbt skills, moderate Spark experience, limited Kafka operations experience.
Incremental platform budget is $35K/month.
Must support GDPR deletion within 72 hours and maintain auditability for finance data.
Existing hourly batch pipeline must remain available as fallback during migration.

Interview Guides

Context

Scale Requirements

Requirements

Constraints

Design AWS Snowflake Realtime Analytics

Context

Scale Requirements

Requirements

Constraints

Your Answer

Design AWS Snowflake Realtime Analytics

Context

Scale Requirements

Requirements

Constraints

Design AWS Snowflake Realtime Analytics

Context

Scale Requirements

Requirements

Constraints

Your Answer