Interview Guides

Design API Idempotency Data Pipeline

Hard

Pipelines

Context

LedgerFlow, a B2B payments platform, exposes a core API for payment creation, refunds, and ledger mutations. Today, retries from clients, load balancers, and downstream workers can produce duplicate side effects, and the current batch reconciliation process detects issues hours later. You need to design an idempotency framework as a data pipeline problem: capture requests, deduplicate safely across synchronous API and asynchronous processing, and provide replay, auditability, and monitoring.

Scale Requirements

Traffic: 35K API requests/second peak, 8K average
Payload size: 1-8 KB JSON per request
Idempotency window: 72 hours for external clients, 30 days audit retention in hot storage, 1 year in cold storage
Latency target: P99 API overhead from idempotency checks < 25 ms
Durability: No duplicate side effects for committed operations under retries, worker restarts, or network timeouts
Storage: ~2.5B idempotency records/month

Requirements

Design a framework that guarantees the same idempotency key + request fingerprint returns the same result without re-executing side effects.
Support both online request handling and async pipeline stages (Kafka consumers, Airflow backfills, replay jobs).
Define the storage model for idempotency keys, request hashes, response snapshots, processing state, TTL, and audit metadata.
Handle race conditions from concurrent duplicate requests across multiple API pods and regions.
Prevent false reuse when the same key is sent with a different payload.
Support reprocessing/backfills while preserving idempotent writes into downstream warehouse tables.
Include monitoring, alerting, dead-letter handling, and operational recovery procedures.

Constraints

AWS-first stack; existing services use EKS, PostgreSQL, Kafka, Airflow, and S3
Budget increase capped at $40K/month
PCI and SOX audit requirements; immutable audit trail required
Cross-region active/passive failover; RPO < 5 minutes, RTO < 30 minutes

Design API Idempotency Data Pipeline

Hard

Pipelines

Context

Scale Requirements

Traffic: 35K API requests/second peak, 8K average
Payload size: 1-8 KB JSON per request
Idempotency window: 72 hours for external clients, 30 days audit retention in hot storage, 1 year in cold storage
Latency target: P99 API overhead from idempotency checks < 25 ms
Durability: No duplicate side effects for committed operations under retries, worker restarts, or network timeouts
Storage: ~2.5B idempotency records/month

Requirements

Design a framework that guarantees the same idempotency key + request fingerprint returns the same result without re-executing side effects.
Support both online request handling and async pipeline stages (Kafka consumers, Airflow backfills, replay jobs).
Define the storage model for idempotency keys, request hashes, response snapshots, processing state, TTL, and audit metadata.
Handle race conditions from concurrent duplicate requests across multiple API pods and regions.
Prevent false reuse when the same key is sent with a different payload.
Support reprocessing/backfills while preserving idempotent writes into downstream warehouse tables.
Include monitoring, alerting, dead-letter handling, and operational recovery procedures.

Constraints

AWS-first stack; existing services use EKS, PostgreSQL, Kafka, Airflow, and S3
Budget increase capped at $40K/month
PCI and SOX audit requirements; immutable audit trail required
Cross-region active/passive failover; RPO < 5 minutes, RTO < 30 minutes

Your Answer

Design API Idempotency Data Pipeline

Hard

Pipelines

Context

Scale Requirements

Traffic: 35K API requests/second peak, 8K average
Payload size: 1-8 KB JSON per request
Idempotency window: 72 hours for external clients, 30 days audit retention in hot storage, 1 year in cold storage
Latency target: P99 API overhead from idempotency checks < 25 ms
Durability: No duplicate side effects for committed operations under retries, worker restarts, or network timeouts
Storage: ~2.5B idempotency records/month

Requirements

Design a framework that guarantees the same idempotency key + request fingerprint returns the same result without re-executing side effects.
Support both online request handling and async pipeline stages (Kafka consumers, Airflow backfills, replay jobs).
Define the storage model for idempotency keys, request hashes, response snapshots, processing state, TTL, and audit metadata.
Handle race conditions from concurrent duplicate requests across multiple API pods and regions.
Prevent false reuse when the same key is sent with a different payload.
Support reprocessing/backfills while preserving idempotent writes into downstream warehouse tables.
Include monitoring, alerting, dead-letter handling, and operational recovery procedures.

Constraints

AWS-first stack; existing services use EKS, PostgreSQL, Kafka, Airflow, and S3
Budget increase capped at $40K/month
PCI and SOX audit requirements; immutable audit trail required
Cross-region active/passive failover; RPO < 5 minutes, RTO < 30 minutes

Design API Idempotency Data Pipeline

Hard

Pipelines

Context

Scale Requirements

Traffic: 35K API requests/second peak, 8K average
Payload size: 1-8 KB JSON per request
Idempotency window: 72 hours for external clients, 30 days audit retention in hot storage, 1 year in cold storage
Latency target: P99 API overhead from idempotency checks < 25 ms
Durability: No duplicate side effects for committed operations under retries, worker restarts, or network timeouts
Storage: ~2.5B idempotency records/month

Requirements

Design a framework that guarantees the same idempotency key + request fingerprint returns the same result without re-executing side effects.
Support both online request handling and async pipeline stages (Kafka consumers, Airflow backfills, replay jobs).
Define the storage model for idempotency keys, request hashes, response snapshots, processing state, TTL, and audit metadata.
Handle race conditions from concurrent duplicate requests across multiple API pods and regions.
Prevent false reuse when the same key is sent with a different payload.
Support reprocessing/backfills while preserving idempotent writes into downstream warehouse tables.
Include monitoring, alerting, dead-letter handling, and operational recovery procedures.

Constraints

AWS-first stack; existing services use EKS, PostgreSQL, Kafka, Airflow, and S3
Budget increase capped at $40K/month
PCI and SOX audit requirements; immutable audit trail required
Cross-region active/passive failover; RPO < 5 minutes, RTO < 30 minutes