Context
CodeWave, a browser-based collaborative IDE, currently stores editor events and execution logs in PostgreSQL and runs hourly batch jobs for analytics and replay. This architecture cannot support real-time collaboration diagnostics, execution telemetry, or reliable session reconstruction for thousands of concurrent users.
You need to design a data pipeline that ingests edit operations, presence updates, and code execution events, processes them in near real time, and serves both operational and analytical use cases.
Scale Requirements
- Concurrent users: 25,000 active users, 5,000 active collaboration rooms
- Peak throughput: 300,000 events/sec across edits, cursor moves, chat, and execution logs
- Event size: 0.5-4 KB JSON payloads
- Latency target: operational views < 2 seconds; analytics-ready tables < 5 minutes
- Storage: 4 TB/day raw event volume; 180-day raw retention; 2-year aggregated retention
- Reliability: 99.95% pipeline availability; no duplicate execution records in downstream billing/audit tables
Requirements
- Ingest editor operation events, room presence events, and code execution lifecycle events from web clients and execution workers.
- Support ordered processing per document/session while allowing horizontal scale across rooms.
- Validate schemas, deduplicate retries, and quarantine malformed or oversized payloads.
- Produce two outputs: low-latency operational stores for room/session debugging and warehouse tables for product analytics, usage billing, and reliability reporting.
- Support replay/backfill of historical events for incident recovery and model changes.
- Orchestrate downstream batch transformations and data quality checks with clear lineage and SLAs.
Constraints
- Infrastructure must run on AWS using managed services where practical.
- Team has strong SQL/Airflow skills but limited Flink expertise.
- Budget cap is $40K/month incremental spend.
- Execution logs may contain user code/output, so encryption at rest, row-level access controls, and 30-day deletion workflows are required for compliance.