Context
Meta is building a next-generation embedded device that streams high-frequency accelerometer telemetry for motion analysis, fault detection, and fleet diagnostics. The current prototype writes CSV logs to local flash and uploads them in bulk when connectivity returns, which causes data loss during power events, poor observability, and multi-hour delays before engineers can query data in internal analytics systems.
You need to design a production-grade acquisition and data pipeline from device firmware through cloud ingestion and downstream processing, using Meta-native infrastructure where appropriate.
Scale Requirements
- Sensors: 3-axis accelerometer at 8 kHz per device
- Payload: 3 axes + temperature + timestamp + sequence number, ~32 bytes/sample after binary packing
- Per-device throughput: ~256 KB/s raw, ~22 GB/day if continuously connected
- Fleet size: 250K devices globally, with 20K concurrent online at peak
- Cloud ingest target: sustain 5 GB/s burst ingest across regions
- Latency: hot-path telemetry available for querying/alerting in < 10 seconds; durable raw storage in < 60 seconds
- Retention: raw binary for 30 days, downsampled features for 1 year
Requirements
- Design device-side buffering to tolerate up to 30 minutes of network outage without losing critical samples.
- Define the transmission protocol, batching strategy, compression, and retry semantics for unreliable mobile/Wi-Fi links.
- Build a cloud ingestion path using Scribe and downstream stream processing to validate schema, detect gaps/duplicates, and preserve ordering per device.
- Produce two outputs: raw immutable telemetry and derived feature streams (RMS, FFT bands, shock events) for analytics.
- Support replay/backfill from raw storage without double-counting downstream aggregates.
- Specify monitoring, alerting, and operational recovery for firmware bugs, clock drift, packet loss, and regional outages.
Constraints
- Device MCU has 512 KB RAM and 8 GB flash shared with other logs.
- Battery impact from radio usage must stay under 5% daily overhead.
- Assume intermittent connectivity and device clocks that can drift by ±100 ppm.
- Data must be encrypted in transit and at rest; PII is not present, but fleet identifiers are sensitive.