Context
Apple Retail collects logs from Apple Store point-of-sale systems, in-store devices, apple.com retail flows, and inventory services. The current process relies on large nightly batch loads into an internal analytics environment, which creates stale reporting, weak failure isolation, and slow recovery during peak launches.
Design a pipeline that ingests terabytes of retail logs per day and makes curated data available for operations, finance, and fraud analytics with both near-real-time and batch access patterns.
Scale Requirements
- Daily volume: 12-18 TB raw compressed logs/day
- Peak throughput: 250K events/sec during product launches and holiday traffic
- Average event size: 1.5-3 KB JSON/Avro records
- Latency targets: <2 minutes to raw landing, <10 minutes to curated operational tables, <2 hours for full daily reconciliations
- Retention: 180 days raw, 3 years curated aggregates
- Availability: 99.9% for ingestion and processing
Requirements
- Ingest logs from Apple Store retail systems, online checkout, inventory events, and device telemetry into a unified pipeline.
- Support both stream processing for operational dashboards and batch processing for financial reconciliation and backfills.
- Enforce schema validation, deduplication, late-arrival handling, and idempotent reprocessing.
- Store immutable raw data plus transformed bronze/silver/gold datasets for downstream analytics.
- Orchestrate dependencies between ingestion, transformation, data quality checks, and serving layers.
- Provide monitoring for freshness, lag, failed loads, and data quality regressions.
- Enable replay of at least 7 days of source data without double-counting.
Constraints
- Prefer Apple-managed or Apple-standard internal platforms where possible; if external OSS is used, justify it.
- PCI-sensitive retail events must be tokenized before durable storage.
- Cross-region resiliency is required for critical ingestion paths.
- Team size is limited: 5 data engineers and 1 SRE, so operational simplicity matters.
- Budget should avoid always-on oversized compute; favor autoscaling and storage/compute separation.
Your answer should cover ingestion architecture, partitioning strategy, storage layout, orchestration, data quality framework, replay/backfill design, and how you would balance streaming vs. batch processing for Apple Retail.