Context
NovaPlay operates a global gaming platform with tens of millions of connected consoles across North America, Europe, and Asia. Today, gameplay telemetry is uploaded in hourly batches to cloud storage and processed overnight, which is too slow for live ops, fraud detection, crash analysis, and matchmaking health monitoring.
You need to design a real-time telemetry pipeline that ingests gameplay events from consoles, validates and enriches them, and makes them available for operational dashboards and downstream analytics with low latency.
Scale Requirements
- Active devices: 18M daily active consoles, 4M concurrent at peak
- Throughput: 1.2M events/sec sustained peak, 250K avg
- Event size: 1-3 KB compressed JSON/Protobuf per event
- Daily volume: ~90-120 TB raw/day
- Latency target: P95 event-to-queryable under 60 seconds for operational use cases
- Retention: 30 days raw hot storage, 1 year cold archive, aggregates retained indefinitely
- Availability: 99.95% ingestion availability across regions
Requirements
- Ingest telemetry from consoles globally with regional failover and ordered processing per
player_id or session_id.
- Support multiple event types: match start/end, movement, purchases, crashes, anti-cheat signals, and network quality metrics.
- Enforce schema validation, deduplication, idempotent writes, and quarantine of malformed payloads.
- Enrich events with game metadata, build version, geo/region, and device attributes.
- Deliver low-latency aggregates for live dashboards and persist raw + curated data for analytics and replay.
- Design monitoring, backfill, replay, and disaster recovery for regional outages.
Constraints
- Primary cloud is AWS; analytics warehouse is Snowflake.
- Team has strong Spark and Airflow experience, limited Flink expertise.
- Budget target is <$80K/month incremental platform spend excluding Snowflake compute.
- Must support GDPR/CCPA deletion workflows and avoid storing raw IP addresses beyond 7 days.
- Console SDK versions will be heterogeneous for months, so schema evolution must be backward compatible.