Context
PulseCart, a global retail marketplace, currently lands application, order, and inventory events into Amazon S3 and runs hourly Spark batch jobs before loading Snowflake. Product and operations teams now need sub-2-minute freshness for fraud, inventory, and conversion dashboards, so the platform must support real-time analytics without breaking existing batch consumers.
You are asked to design a scalable AWS + Snowflake architecture that ingests event streams and database changes, validates and transforms data, and serves analytics-ready tables with strong observability and recovery.
Scale Requirements
- Throughput: 250K events/sec peak, 60K avg
- Sources: web/mobile clickstream, checkout service events, CDC from Aurora PostgreSQL
- Event size: 1-3 KB JSON; CDC payloads up to 10 KB
- Latency target: source to Snowflake queryable in < 2 minutes P95
- Daily volume: ~12 TB raw compressed
- Retention: 180 days raw in S3, 3 years curated in Snowflake
Requirements
- Design ingestion for both append-only events and CDC updates/deletes.
- Support schema validation, deduplication, and replay without double-counting.
- Build a raw-to-curated ELT pattern in Snowflake for near-real-time dashboards.
- Preserve event ordering where needed for order and inventory topics.
- Define orchestration for streaming jobs, dbt transformations, backfills, and dependency management.
- Include monitoring for freshness, throughput, cost, and data quality.
- Describe failure recovery for broker outages, bad schema deployments, late data, and Snowflake load failures.
Constraints
- Existing cloud footprint is AWS; Snowflake is the analytical warehouse.
- Team has strong SQL/dbt skills, moderate Spark experience, limited Kafka operations experience.
- Incremental platform budget is $35K/month.
- Must support GDPR deletion within 72 hours and maintain auditability for finance data.
- Existing hourly batch pipeline must remain available as fallback during migration.