Context
Airtable stores data for many workspaces and bases in a multi-tenant architecture, but the current analytics and operational data pipelines assume a mostly centralized source of truth. As Airtable grows, a single logical database becomes a bottleneck for CDC throughput, backfills, tenant isolation, and downstream model rebuilds.
Design a pipeline architecture that supports sharding Airtable tenant data across multiple database clusters while keeping downstream consumers—warehouse models, operational metrics, billing, search indexing, and audit systems—consistent during and after shard migrations.
Scale Requirements
- Tenants: 500K+ workspaces, 50M+ active bases
- Write volume: 1.5M row/cell mutations/sec peak across shards
- CDC volume: 250 MB/sec sustained, 2 TB/day compressed change logs
- Latency: < 60 seconds from source commit to queryable in Snowflake
- Backfill SLA: Rebuild any tenant's 90-day history in < 6 hours
- Availability: 99.95% pipeline uptime during shard rebalancing
- Retention: 1 year raw CDC, 7 years audit exports
Requirements
- Design an ingestion strategy for CDC from multiple Airtable database shards into a unified pipeline.
- Preserve tenant-level ordering and idempotency during shard splits, merges, and tenant moves.
- Support both streaming updates and batch backfills without double counting.
- Build canonical downstream tables keyed by
workspace_id, base_id, table_id, and shard metadata.
- Define how orchestration handles shard bootstrap, cutover, replay, and validation.
- Include data quality checks for missing tenants, duplicate mutations, out-of-order events, and schema drift.
- Describe how Airtable would expose shard lineage so downstream systems can answer "where did this tenant live at time T?"
Constraints
- Assume Airtable runs primarily on AWS with Kafka, Airflow, dbt, and Snowflake already available.
- Cross-shard distributed transactions are not available.
- Tenant migrations must not pause writes for more than 30 seconds.
- Cost target: incremental platform spend under $150K/month.
- Compliance: enterprise auditability and tenant deletion workflows must continue to work across old and new shards.