Context
LedgerLoop, a mid-market fintech company, generates daily financial reports for revenue, refunds, fees, and account balances from payment processors, banking feeds, and an internal ledger service. The current process relies on ad hoc SQL jobs and spreadsheet reconciliations, causing inconsistent numbers across Finance, Risk, and Executive dashboards.
Design a production-grade batch pipeline that produces accurate, auditable financial reporting tables in the warehouse and supports month-end close without manual fixes. Assume the company is AWS-based and wants a reliable daily reporting SLA with clear reconciliation controls.
Scale Requirements
- Sources: 3 payment processors, 2 bank feeds, 1 internal ledger database
- Volume: 120M transaction records/day, ~350 GB raw compressed data/day
- Latency: Daily reports ready by 6:00 AM UTC; critical reconciliation reruns within 30 minutes
- Retention: 7 years for raw and curated financial data
- Accuracy target: 99.99% record completeness; zero unexplained balance deltas above $100
Requirements
- Ingest daily files and CDC extracts from external processors, bank statements, and the internal ledger into a raw zone.
- Standardize schemas, currencies, timestamps, and transaction states across all sources.
- Build reconciliation logic for gross volume, net settlements, refunds, chargebacks, and ending balances.
- Ensure idempotent reruns and support backfills for late-arriving corrections up to 90 days.
- Publish finance-ready fact tables and daily reporting aggregates to Snowflake.
- Implement data quality checks for completeness, duplicate transactions, referential integrity, and balance matching.
- Provide lineage, auditability, and failure recovery suitable for SOX-style controls.
Constraints
- Compliance: Must support audit trails, immutable raw storage, and PII minimization
- Infrastructure: Use AWS services already in place; avoid introducing more than one new major platform
- Budget: Incremental cloud spend should stay under $18K/month
- Team: 3 data engineers, 1 analytics engineer, limited on-call coverage