Context
Anrok’s tax platform receives financial state changes from distributed microservices such as billing, invoice generation, tax calculation, refunds, and ledger adjustments. Today, each service writes independently to its own operational store and downstream analytics tables, which creates reconciliation gaps, duplicate postings, and delayed visibility into tax liabilities.
You need to design a pipeline that guarantees consistent, replayable financial records across services while feeding Anrok’s internal reporting and reconciliation surfaces. The design should support both near-real-time operational visibility and batch-safe backfills without corrupting the financial ledger.
Scale Requirements
- Throughput: 25K financial events/second peak, 5K/second sustained
- Event size: 1-4 KB JSON/Avro per event
- Latency target: P95 < 60 seconds from service commit to queryable warehouse record
- Daily volume: ~1.2B events/day, ~2.5 TB raw/day
- Retention: 7 years immutable financial history, 90 days hot replay storage
- Accuracy target: 0 missing committed events, duplicate posting rate < 1 per 100M events
Requirements
- Design an ingestion pattern that preserves ordering where needed for financial entities such as
invoice_id or transaction_id.
- Ensure exactly-once or effectively-once processing across microservices, stream processors, and warehouse loads.
- Model append-only financial events and derive current-state tables for Anrok reporting without losing auditability.
- Support late-arriving corrections, reversals, refunds, and replay/backfill workflows.
- Implement reconciliation between source-of-truth service data and warehouse ledger outputs.
- Define orchestration for streaming jobs, batch repair jobs, and downstream dbt transformations.
- Include monitoring, alerting, and failure recovery for data loss, schema drift, and out-of-balance financial records.
Constraints
- AWS-first environment; prefer managed services where operationally justified
- SOX-style auditability and strict access controls for financial data
- No distributed 2PC across microservices
- Team can support Kafka, Airflow, dbt, and Snowflake, but headcount is limited
- Backfills must not block real-time processing or produce duplicate ledger entries