Context
AutoRABIT Holding wants a unified pipeline for DevOps telemetry across AutoRABIT CI, AutoRABIT Release Management, AutoRABIT Backup, and AutoRABIT CodeScan. Today, each product emits logs, deployment events, scan results, and pipeline metadata into separate stores, making it hard for engineering managers to track release health, failure trends, and compliance in one place.
Design a modern data pipeline that consolidates these DevOps signals into an analytics-ready platform for operational dashboards and near-real-time alerts.
Scale Requirements
- Sources: 4 AutoRABIT products + Git provider webhooks + Kubernetes audit logs
- Throughput: 25K events/sec peak, 5K avg
- Event size: 1-8 KB JSON
- Daily volume: ~1.5 TB raw, 450 GB compressed
- Latency target: critical events queryable in < 2 minutes; batch aggregates refreshed every 15 minutes
- Retention: raw data 180 days, curated warehouse tables 2 years
Requirements
- Ingest deployment events, build logs, backup job results, static-analysis findings, and environment health events from AutoRABIT platforms.
- Support both streaming ingestion for operational visibility and batch ELT for curated reporting models.
- Enforce schema validation, deduplication by
event_id, and lineage from source system to warehouse table.
- Build warehouse tables for
deployments, pipeline_runs, security_findings, and backup_executions.
- Orchestrate dependencies so that raw ingestion, quality checks, and transformation jobs are independently retryable and idempotent.
- Provide monitoring for freshness, failed loads, data quality drift, and cost.
Constraints
- Existing stack is AWS-centric and already uses Kubernetes for application workloads.
- Team size is small: 3 data engineers and 1 platform engineer.
- Must avoid vendor sprawl; prefer managed services where operational burden is low.
- Compliance requires auditability and immutable raw retention for 180 days.
- Budget target: incremental platform cost under $18K/month.
In your answer, explain which technologies are critical for this modern DevOps data pipeline, why they fit AutoRABIT Holding’s use case, and how you would design for reliability, observability, and low-latency analytics.