Unify Relational and Document Sources

Scenario

You are building a new analytics pipeline for a marketplace platform that stores orders in a transactional relational database and product catalog metadata in a document database. Finance and operations reports currently disagree because structured order facts are loaded nightly while semi-structured catalog attributes are flattened inconsistently by different teams. Leadership wants a single Snowflake model that preserves transactional integrity from the relational source while still supporting flexible ingestion of evolving non-relational fields. The immediate pain point is a month-end reconciliation issue caused by missing and duplicated product attributes in downstream reporting.

Current State

Component	Status
Order System	PostgreSQL 14, normalized OLTP schema
Catalog System	MongoDB 6.0, nested JSON documents
Ingestion	Nightly exports to cloud object storage
Warehouse	Snowflake raw and curated databases
Transformations	Snowflake Tasks + dbt incremental models
Data Quality	Manual SQL checks, no centralized alerts

Scale: ~120M order rows, ~15M product documents, 8M daily order changes, 500K daily catalog updates, 2-hour freshness target, 2-year history retention.

Question

How would you design the Snowflake-centric ELT pipeline so that relational order data and non-relational catalog data can be ingested, modeled, and reconciled consistently for analytics? Explain how your design would handle schema evolution, joins between structured and semi-structured data, and data quality validation at this scale.

Scenario

Current State

Component	Status
Order System	PostgreSQL 14, normalized OLTP schema
Catalog System	MongoDB 6.0, nested JSON documents
Ingestion	Nightly exports to cloud object storage
Warehouse	Snowflake raw and curated databases
Transformations	Snowflake Tasks + dbt incremental models
Data Quality	Manual SQL checks, no centralized alerts

Scale: ~120M order rows, ~15M product documents, 8M daily order changes, 500K daily catalog updates, 2-hour freshness target, 2-year history retention.

Question

Scenario

Current State

Component	Status
Order System	PostgreSQL 14, normalized OLTP schema
Catalog System	MongoDB 6.0, nested JSON documents
Ingestion	Nightly exports to cloud object storage
Warehouse	Snowflake raw and curated databases
Transformations	Snowflake Tasks + dbt incremental models
Data Quality	Manual SQL checks, no centralized alerts

Scale: ~120M order rows, ~15M product documents, 8M daily order changes, 500K daily catalog updates, 2-hour freshness target, 2-year history retention.

Question

Scenario

Current State

Component	Status
Order System	PostgreSQL 14, normalized OLTP schema
Catalog System	MongoDB 6.0, nested JSON documents
Ingestion	Nightly exports to cloud object storage
Warehouse	Snowflake raw and curated databases
Transformations	Snowflake Tasks + dbt incremental models
Data Quality	Manual SQL checks, no centralized alerts

Scale: ~120M order rows, ~15M product documents, 8M daily order changes, 500K daily catalog updates, 2-hour freshness target, 2-year history retention.

Interview Guides

Scenario

Current State

Question

Unify Relational and Document Sources

Scenario

Current State

Question

Your Answer

Unify Relational and Document Sources

Scenario

Current State

Question

Unify Relational and Document Sources

Scenario

Current State

Question

Your Answer