Design Sub-Second CDC Sync

Scenario

You are designing a change data capture pipeline for an e-commerce platform that needs operational data from MySQL available in its warehouse with sub-second latency. Today, analysts and finance teams see inventory, orders, and refunds several minutes late, and recent executive escalations exposed mismatches between application state and reporting tables during peak traffic. The platform wants a streaming-first design that preserves row-level changes, supports schema evolution, and can recover cleanly from failures without double-applying updates.

Current State

Component	Status / Technology
Source OLTP	MySQL 8.0 primary + read replicas
CDC Capture	Nightly dumps plus hourly incremental queries
Transport	No durable event bus
Warehouse	Snowflake with raw and modeled layers
Transformations	dbt scheduled jobs
Orchestration	Shopify Flow for operational triggers; cron-based data jobs

Scale: ~25K row changes/sec peak, ~4K avg across orders, order_lines, inventory_levels, refunds, and customers; 1.5 TB source data; 300+ tables, 40 business-critical; target freshness P95 < 1 second to raw warehouse tables; 7-day replay window.

Question

How would you architect this CDC pipeline end to end so that MySQL changes land in warehouse tables with sub-second latency while preserving ordering, idempotency, and correctness under schema changes, backfills, and partial outages? Explain the design choices you would make for capture, transport, processing, warehouse loading, and operational monitoring.

Scenario

Current State

Component	Status / Technology
Source OLTP	MySQL 8.0 primary + read replicas
CDC Capture	Nightly dumps plus hourly incremental queries
Transport	No durable event bus
Warehouse	Snowflake with raw and modeled layers
Transformations	dbt scheduled jobs
Orchestration	Shopify Flow for operational triggers; cron-based data jobs

Question

Scenario

Current State

Component	Status / Technology
Source OLTP	MySQL 8.0 primary + read replicas
CDC Capture	Nightly dumps plus hourly incremental queries
Transport	No durable event bus
Warehouse	Snowflake with raw and modeled layers
Transformations	dbt scheduled jobs
Orchestration	Shopify Flow for operational triggers; cron-based data jobs

Question

Scenario

Current State

Component	Status / Technology
Source OLTP	MySQL 8.0 primary + read replicas
CDC Capture	Nightly dumps plus hourly incremental queries
Transport	No durable event bus
Warehouse	Snowflake with raw and modeled layers
Transformations	dbt scheduled jobs
Orchestration	Shopify Flow for operational triggers; cron-based data jobs

Interview Guides

Scenario

Current State

Question

Design Sub-Second CDC Sync

Scenario

Current State

Question

Your Answer

Design Sub-Second CDC Sync

Scenario

Current State

Question

Design Sub-Second CDC Sync

Scenario

Current State

Question

Your Answer