Context
Anrok ingests transactions, invoices, credit notes, and subscription updates from third-party billing platforms into the Anrok tax platform. Today, some integrations rely on a mix of scheduled API pulls and webhook handlers, which leads to duplicate events, missed updates during provider outages, and inconsistent tax state when upstream systems are eventually consistent.
Design a resilient pipeline that keeps Anrok's billing ledger and downstream tax calculation surfaces accurate despite API rate limits, out-of-order webhooks, retries, and delayed updates from providers such as Stripe or Chargebee.
Scale Requirements
- Connected merchants: 8,000
- Providers: 6 billing systems, each with different API quotas and webhook semantics
- Peak webhook volume: 15,000 events/minute
- Backfill volume: up to 250M billing objects across 24 months
- Latency target: webhook to Anrok queryable state in < 2 minutes P95
- Reconciliation SLA: provider state converges in Anrok within 30 minutes for 99.9% of objects
- Storage: 15 TB raw history, 3-year retention for auditability
Requirements
- Ingest provider webhooks into Anrok with signature verification, replay protection, and durable storage.
- Design pull-based sync jobs that respect per-provider and per-merchant rate limits while supporting incremental sync and historical backfills.
- Handle eventual consistency: a webhook may arrive before the provider API reflects the latest object state.
- Guarantee idempotent processing for duplicate webhooks, retried API pages, and re-run backfills.
- Model raw events, canonical billing objects, and reconciliation status so downstream tax calculations read a consistent view.
- Define orchestration for near-real-time processing, scheduled reconciliation, and targeted reprocessing.
- Include data quality checks, observability, and on-call alerting.
Constraints
- AWS-first stack; prefer managed services where possible.
- Team of 5 engineers; operational simplicity matters more than perfect theoretical throughput.
- Must support SOC 2 auditability and immutable raw event retention.
- Budget target: <$35K/month incremental infrastructure cost.
- Some providers expose only cursor-based pagination and strict burst limits (for example, 100 requests/sec/account).