Context
Northstar Retail, a mid-sized e-commerce company, ingests order, payment, and fulfillment data from PostgreSQL, Stripe webhooks, and a warehouse management system into Snowflake. The analytics team has discovered inconsistent daily revenue, duplicate orders, and mismatched order statuses across downstream dashboards, so you need to redesign the pipeline and incident response process to detect, isolate, and correct inconsistent data.
The current stack uses Airflow batch jobs, Python ETL scripts, and dbt transformations, but there is limited lineage, weak validation, and no reliable replay process.
Scale Requirements
- Sources: PostgreSQL OLTP (~120M orders total), Stripe webhooks (~8M events/day), WMS SFTP drops (~250 files/day)
- Daily volume: ~450 GB raw data/day
- Batch cadence: every 15 minutes for orders/payments, hourly for fulfillment
- Latency target: analytics-ready tables available within 20 minutes of source updates
- Data quality SLA: <0.1% duplicate order records, <0.05% reconciliation variance vs source-of-truth finance reports
- Retention: 2 years in Snowflake, 90 days raw landing in S3
Requirements
- Design a pipeline that detects inconsistent data across sources before publishing curated tables.
- Implement schema validation, duplicate detection, referential integrity checks, and cross-system reconciliation.
- Ensure idempotent processing for reruns, retries, and partial backfills.
- Support quarantine of bad records while allowing valid records to continue downstream.
- Provide a root-cause workflow for investigating whether inconsistencies came from source changes, late-arriving data, ETL bugs, or transformation logic.
- Define monitoring, alerting, and rollback/reprocessing strategies.
- Produce trusted Snowflake marts for finance and operations without breaking existing BI dashboards.
Constraints
- AWS is the required cloud platform.
- Team size is 3 data engineers and 1 analytics engineer.
- Incremental monthly infrastructure budget increase is capped at $18K.
- Finance reports are SOX-relevant, so all corrections must be auditable and reproducible.