Context
Alameda County Community Food Bank needs a reliable data workflow that combines inventory movements, donation intake, and partner agency fulfillment activity from multiple internal systems into a single trusted reporting layer. Today, warehouse operations, donor intake, and agency order data are updated in separate operational tools and spreadsheets, creating reconciliation gaps, stale reporting, and inconsistent counts for on-hand inventory and fulfilled cases.
Assume the Food Bank wants a near-real-time operational view plus daily finance-grade reconciliations. The current stack is AWS-based, with PostgreSQL-backed internal applications, SFTP-delivered CSVs from legacy warehouse workflows, and a central Snowflake warehouse used for reporting.
Scale Requirements
- Sources: 6 internal systems, including warehouse inventory, donation intake, partner agency ordering, and manual adjustment feeds
- Volume: ~8M inventory transaction rows/month, ~1.2M donation records/month, ~400K agency fulfillment records/month
- Latency: operational dashboards < 10 minutes behind source systems; daily reconciled tables ready by 6:00 AM PT
- Storage: 3 years hot data in Snowflake, 7 years archived in Amazon S3
- Reliability target: 99.9% successful daily pipeline completion, no duplicate business events
Requirements
- Design ingestion for CDC from PostgreSQL systems and batch ingestion for SFTP/CSV feeds.
- Build canonical data models for
inventory_events, donations, agency_orders, and fulfillment_line_items.
- Ensure idempotent processing for retries, late-arriving files, and source replays.
- Reconcile inventory balances across receipts, transfers, adjustments, spoilage, and partner agency distributions.
- Implement data quality checks for missing SKUs, negative balances, duplicate donation IDs, and orphaned fulfillment records.
- Orchestrate both near-real-time and nightly batch dependencies with clear SLAs and backfill support.
- Expose curated tables for Alameda County Community Food Bank operational dashboards and finance reporting.
Constraints
- Small team: 3 data engineers, 1 analytics engineer
- Prefer managed AWS services and low operational overhead
- Budget ceiling: $18K/month incremental infrastructure spend
- Must preserve auditability for food traceability and historical corrections
- Some source systems cannot be modified beyond read replicas or export jobs