Context
FinEdge processes card payments and refunds for mid-market merchants. Its current pipeline uses nightly Airflow jobs to extract OLTP data from PostgreSQL into S3 and Snowflake, but frequent partial failures, duplicate loads, and weak access controls have created reporting inconsistencies and audit concerns.
You are asked to redesign the pipeline so finance, risk, and compliance teams can rely on hourly data while meeting security and reliability requirements for PCI-adjacent payment data.
Scale Requirements
- Sources: PostgreSQL payment DB, Kafka fraud events, third-party settlement SFTP files
- Throughput: 25M payment records/day, 80 GB/day raw ingest
- Latency: Core finance tables available in Snowflake within 15 minutes of source commit
- Batch window: Hourly incremental loads, daily reconciliation by 06:00 UTC
- Retention: 7 years for curated finance data, 90 days for raw landing data
Requirements
- Design an ingestion and transformation pipeline for hourly incremental loads from PostgreSQL CDC, Kafka fraud events, and settlement files.
- Ensure reliability through idempotent processing, replay/backfill support, checkpointing, and safe recovery from partial failures.
- Ensure security with encryption in transit and at rest, RBAC, secrets management, audit logging, and PII/tokenized field handling.
- Implement data quality checks for schema drift, duplicate transactions, reconciliation mismatches, and null/invalid business keys.
- Produce analytics-ready tables in Snowflake for payments, refunds, chargebacks, and settlements.
- Orchestrate dependencies so downstream reconciliation runs only after all hourly loads and validations succeed.
Constraints
- AWS is the mandated cloud; existing tools include Airflow, S3, and Snowflake.
- Incremental budget is capped at $30K/month.
- Team size is 3 data engineers and 1 platform engineer.
- Must support SOC 2 audit evidence and PCI-aligned controls; raw PAN data cannot be stored in the warehouse.
- Source PostgreSQL cannot tolerate heavy read load or long-running full extracts.