You are redesigning a merchant reporting system for a payments platform that must show balances, payouts, fees, disputes, and refunds with both near-real-time visibility and finance-grade daily accuracy. The current reporting stack was built incrementally and now produces mismatched numbers between dashboard views, CSV exports, and month-end finance reports. An executive escalation was triggered after several large merchants found payout totals that did not reconcile to underlying transactions during peak sales periods. You need a pipeline design that supports operational reporting within minutes while preserving replayability, auditability, and deterministic backfills.
| Component | Status / Technology |
|---|---|
| Event Sources | PaymentIntents, Charges, Refunds, Disputes, Balance Transactions, Payouts APIs and internal service events |
| Ingestion | Mixed webhooks, service DB CDC, and hourly object storage drops |
| Stream Processing | Limited Flink jobs for fee enrichment |
| Batch Processing | Nightly Spark jobs building warehouse tables |
| Warehouse | Snowflake with ad hoc SQL models |
| Orchestration | Apache Airflow 2.x |
| Serving | Merchant dashboard APIs, Sigma-style ad hoc reporting, CSV exports |
Scale: 8-15M financial events/day, 2-4x peak bursts, ~1.5 KB average payload, 5-minute freshness target for operational metrics, T+1 finalization for finance reports, 7-year audit retention.
How would you design the end-to-end reporting pipeline so merchants can query near-real-time and finalized financial views consistently across dashboard, exports, and warehouse reporting surfaces? Explain the architecture, data model, orchestration, and controls you would use to handle corrections, late-arriving events, and backfills at this scale.