Build Consistent Financial Event Pipeline

Context

Anrok’s tax platform receives financial state changes from distributed microservices such as billing, invoice generation, tax calculation, refunds, and ledger adjustments. Today, each service writes independently to its own operational store and downstream analytics tables, which creates reconciliation gaps, duplicate postings, and delayed visibility into tax liabilities.

You need to design a pipeline that guarantees consistent, replayable financial records across services while feeding Anrok’s internal reporting and reconciliation surfaces. The design should support both near-real-time operational visibility and batch-safe backfills without corrupting the financial ledger.

Scale Requirements

Throughput: 25K financial events/second peak, 5K/second sustained
Event size: 1-4 KB JSON/Avro per event
Latency target: P95 < 60 seconds from service commit to queryable warehouse record
Daily volume: ~1.2B events/day, ~2.5 TB raw/day
Retention: 7 years immutable financial history, 90 days hot replay storage
Accuracy target: 0 missing committed events, duplicate posting rate < 1 per 100M events

Requirements

Design an ingestion pattern that preserves ordering where needed for financial entities such as invoice_id or transaction_id.
Ensure exactly-once or effectively-once processing across microservices, stream processors, and warehouse loads.
Model append-only financial events and derive current-state tables for Anrok reporting without losing auditability.
Support late-arriving corrections, reversals, refunds, and replay/backfill workflows.
Implement reconciliation between source-of-truth service data and warehouse ledger outputs.
Define orchestration for streaming jobs, batch repair jobs, and downstream dbt transformations.
Include monitoring, alerting, and failure recovery for data loss, schema drift, and out-of-balance financial records.

Constraints

AWS-first environment; prefer managed services where operationally justified
SOX-style auditability and strict access controls for financial data
No distributed 2PC across microservices
Team can support Kafka, Airflow, dbt, and Snowflake, but headcount is limited
Backfills must not block real-time processing or produce duplicate ledger entries

Context

Scale Requirements

Throughput: 25K financial events/second peak, 5K/second sustained
Event size: 1-4 KB JSON/Avro per event
Latency target: P95 < 60 seconds from service commit to queryable warehouse record
Daily volume: ~1.2B events/day, ~2.5 TB raw/day
Retention: 7 years immutable financial history, 90 days hot replay storage
Accuracy target: 0 missing committed events, duplicate posting rate < 1 per 100M events

Requirements

Design an ingestion pattern that preserves ordering where needed for financial entities such as invoice_id or transaction_id.
Ensure exactly-once or effectively-once processing across microservices, stream processors, and warehouse loads.
Model append-only financial events and derive current-state tables for Anrok reporting without losing auditability.
Support late-arriving corrections, reversals, refunds, and replay/backfill workflows.
Implement reconciliation between source-of-truth service data and warehouse ledger outputs.
Define orchestration for streaming jobs, batch repair jobs, and downstream dbt transformations.
Include monitoring, alerting, and failure recovery for data loss, schema drift, and out-of-balance financial records.

Constraints

AWS-first environment; prefer managed services where operationally justified
SOX-style auditability and strict access controls for financial data
No distributed 2PC across microservices
Team can support Kafka, Airflow, dbt, and Snowflake, but headcount is limited
Backfills must not block real-time processing or produce duplicate ledger entries

Context

Scale Requirements

Throughput: 25K financial events/second peak, 5K/second sustained
Event size: 1-4 KB JSON/Avro per event
Latency target: P95 < 60 seconds from service commit to queryable warehouse record
Daily volume: ~1.2B events/day, ~2.5 TB raw/day
Retention: 7 years immutable financial history, 90 days hot replay storage
Accuracy target: 0 missing committed events, duplicate posting rate < 1 per 100M events

Requirements

Design an ingestion pattern that preserves ordering where needed for financial entities such as invoice_id or transaction_id.
Ensure exactly-once or effectively-once processing across microservices, stream processors, and warehouse loads.
Model append-only financial events and derive current-state tables for Anrok reporting without losing auditability.
Support late-arriving corrections, reversals, refunds, and replay/backfill workflows.
Implement reconciliation between source-of-truth service data and warehouse ledger outputs.
Define orchestration for streaming jobs, batch repair jobs, and downstream dbt transformations.
Include monitoring, alerting, and failure recovery for data loss, schema drift, and out-of-balance financial records.

Constraints

AWS-first environment; prefer managed services where operationally justified
SOX-style auditability and strict access controls for financial data
No distributed 2PC across microservices
Team can support Kafka, Airflow, dbt, and Snowflake, but headcount is limited
Backfills must not block real-time processing or produce duplicate ledger entries

Context

Scale Requirements

Throughput: 25K financial events/second peak, 5K/second sustained
Event size: 1-4 KB JSON/Avro per event
Latency target: P95 < 60 seconds from service commit to queryable warehouse record
Daily volume: ~1.2B events/day, ~2.5 TB raw/day
Retention: 7 years immutable financial history, 90 days hot replay storage
Accuracy target: 0 missing committed events, duplicate posting rate < 1 per 100M events

Requirements

Design an ingestion pattern that preserves ordering where needed for financial entities such as invoice_id or transaction_id.
Ensure exactly-once or effectively-once processing across microservices, stream processors, and warehouse loads.
Model append-only financial events and derive current-state tables for Anrok reporting without losing auditability.
Support late-arriving corrections, reversals, refunds, and replay/backfill workflows.
Implement reconciliation between source-of-truth service data and warehouse ledger outputs.
Define orchestration for streaming jobs, batch repair jobs, and downstream dbt transformations.
Include monitoring, alerting, and failure recovery for data loss, schema drift, and out-of-balance financial records.

Constraints

AWS-first environment; prefer managed services where operationally justified
SOX-style auditability and strict access controls for financial data
No distributed 2PC across microservices
Team can support Kafka, Airflow, dbt, and Snowflake, but headcount is limited
Backfills must not block real-time processing or produce duplicate ledger entries

Interview Guides

Context

Scale Requirements

Requirements

Constraints

Build Consistent Financial Event Pipeline

Context

Scale Requirements

Requirements

Constraints

Your Answer

Build Consistent Financial Event Pipeline

Context

Scale Requirements

Requirements

Constraints

Build Consistent Financial Event Pipeline

Context

Scale Requirements

Requirements

Constraints

Your Answer