Context
PulseCart, a mid-size e-commerce platform, currently processes application events with hourly batch ETL into Snowflake. This architecture is sufficient for historical reporting, but it cannot support near-real-time fraud detection, inventory updates, or live operational dashboards.
You need to design a scalable real-time data processing system that ingests events from web, mobile, and backend services, validates and enriches them, and makes them queryable for analytics and downstream consumers within minutes.
Scale Requirements
- Sources: Web SDK, mobile SDK, checkout service, order service, inventory service
- Peak throughput: 250K events/second, 60K average
- Event size: 1-3 KB JSON
- Daily volume: ~12 TB raw compressed
- Latency target: P95 end-to-end under 2 minutes
- Retention: 30 days hot, 1 year cold archive
- Availability: 99.9% pipeline uptime
Requirements
- Design ingestion for high-throughput event streams with replay capability and ordered processing where needed.
- Implement schema validation, deduplication, late-event handling, and dead-letter routing.
- Support real-time transformations for sessionization, order state changes, and inventory aggregates.
- Load raw and curated data into a warehouse for BI while also supporting downstream operational consumers.
- Define orchestration for streaming jobs, backfills, schema evolution, and recovery workflows.
- Propose monitoring for throughput, lag, latency, data quality, and cost.
- Explain how you would guarantee idempotency and minimize data loss during failures.
Constraints
- AWS is the mandated cloud platform.
- Team has strong SQL/Airflow skills but limited Kafka operations experience.
- Incremental budget is capped at $35K/month.
- PCI-sensitive payment fields must not be persisted in raw analytical storage.
- Existing Snowflake dashboards must remain available during migration.