Context
AtlasMart, a global retail marketplace, serves product catalog and pricing data to search, recommendations, and BI systems. The current PostgreSQL OLTP database is used directly by downstream readers, but it now handles 95% read traffic and only small hourly updates, causing cache churn, slow analytical joins, and replication lag during peak traffic.
You are asked to design a pipeline and storage pattern for a high-read, low-write reference dataset that supports both operational APIs and analytics consumers. The goal is to apply database theory in practice: precomputation, denormalization, indexing, partitioning, materialized views, and immutable snapshots to optimize read performance without sacrificing correctness.
Scale Requirements
- Primary dataset: 180M active product records, 12 attributes on average, ~4 TB compressed
- Change volume: 8M row updates/day, mostly price, inventory status, and merchandising flags
- Read workload: 220K queries/sec peak across APIs and internal services
- Latency target: P95 lookup latency < 25 ms for APIs; analytics table freshness < 15 minutes
- Retention: 2 years of historical snapshots and change history
Requirements
- Design an ingestion and transformation pipeline that converts normalized OLTP tables into read-optimized serving tables.
- Support incremental ELT using change data capture, while preserving a daily immutable snapshot for rollback and backfills.
- Explain how database theory informs the design: normalization vs denormalization, covering indexes, materialized views, partition pruning, and read replicas.
- Guarantee idempotent loads and deterministic rebuilds for any partition or date.
- Define data quality checks for key uniqueness, referential integrity, freshness, and row-count reconciliation.
- Expose outputs to both a low-latency serving store and Snowflake for analytical workloads.
Constraints
- Existing stack is AWS-first: PostgreSQL 14, Debezium, Kafka, Airflow, dbt, Snowflake, Redis
- Incremental budget cap: $30K/month
- No full-table reloads during business hours
- Must support SOX auditability for price changes and replay any day within 24 months