Context
Alameda County Community Food Bank relies on internal donation and disaster-response operations data to keep fundraising, inventory planning, and partner coordination current during high-stress periods. Today, donation events from the online donation platform, CRM updates, and warehouse/partner operations feeds are processed through mostly scheduled batch jobs into Alameda County Community Food Bank reporting tables; during peak campaigns or emergency response, delayed or failed runs create operational blind spots.
Design a highly available data pipeline that keeps critical internal donation and response data fresh even during traffic spikes, partial infrastructure failures, and upstream schema changes. Assume the Food Bank wants a primary streaming path with a batch replay/backfill path.
Scale Requirements
- Peak throughput: 8,000 donation or operational events/sec during disaster-response campaigns; 800/sec average
- Event size: 1-4 KB JSON payloads
- Latency target: P95 source-to-dashboard freshness under 2 minutes
- Daily volume: 150-250 GB raw data during normal periods; up to 1.5 TB/day during emergencies
- Retention: 1 year raw immutable data, 7 years curated finance/audit tables
- Availability target: 99.95% for the pipeline serving internal operations dashboards
Requirements
- Ingest events from Alameda County Community Food Bank donation systems, CRM exports, and warehouse/partner APIs with no single point of failure.
- Support both real-time processing and replayable batch recovery for missed windows or downstream outages.
- Enforce schema validation, deduplication, and idempotent writes for donation transactions and inventory-impacting events.
- Deliver curated tables for fundraising, finance reconciliation, and disaster-response dashboards.
- Define orchestration, failover, and backfill strategy across streaming and batch paths.
- Describe monitoring, alerting, and on-call response for freshness, lag, and data quality incidents.
Constraints
- Existing stack is AWS-centric with Amazon S3 and Apache Airflow already in use.
- Small team: 3 data engineers and shared DevOps support.
- Budget requires managed services where possible, but no always-on oversized clusters.
- Auditability matters: donation records must be traceable and reprocessable without double counting.