Unify Relational and Document Sources

Scenario

You are building a new analytics pipeline for a SaaS platform that stores billing and account data in a transactional relational database, while product activity and user preferences live in a document store. Finance and product teams are reporting conflicting customer metrics because the current daily exports flatten each source differently and drop nested attributes. Leadership wants a Snowflake-based pipeline that preserves the strengths of both source models while producing consistent analytics tables. The immediate trigger is an executive escalation after monthly active customer counts differed by 8% across dashboards.

Current State

Component	Status / Technology
Transactional source	PostgreSQL with normalized customer, subscription, and invoice tables
Operational event source	MongoDB with nested user profile and activity documents
Ingestion	Nightly Python scripts dumping CSV and JSON to Snowflake internal stages
Transformation	Ad hoc SQL scripts run manually in Snowflake worksheets
Warehouse	Snowflake raw and reporting schemas
Orchestration	Basic cron jobs on a VM

Scale: ~250 GB/day new data, 120M PostgreSQL rows, 400M MongoDB documents, nightly batch window of 3 hours, dashboard freshness target of under 30 minutes after load completion.

Question

How would you redesign this pipeline in Snowflake so relational and non-relational data can be ingested, modeled, and reconciled consistently for analytics? Explain how you would handle schema differences, incremental processing, and data quality so downstream teams get one trusted set of customer metrics.

Scenario

Current State

Component	Status / Technology
Transactional source	PostgreSQL with normalized customer, subscription, and invoice tables
Operational event source	MongoDB with nested user profile and activity documents
Ingestion	Nightly Python scripts dumping CSV and JSON to Snowflake internal stages
Transformation	Ad hoc SQL scripts run manually in Snowflake worksheets
Warehouse	Snowflake raw and reporting schemas
Orchestration	Basic cron jobs on a VM

Scale: ~250 GB/day new data, 120M PostgreSQL rows, 400M MongoDB documents, nightly batch window of 3 hours, dashboard freshness target of under 30 minutes after load completion.

Question

Scenario

Current State

Component	Status / Technology
Transactional source	PostgreSQL with normalized customer, subscription, and invoice tables
Operational event source	MongoDB with nested user profile and activity documents
Ingestion	Nightly Python scripts dumping CSV and JSON to Snowflake internal stages
Transformation	Ad hoc SQL scripts run manually in Snowflake worksheets
Warehouse	Snowflake raw and reporting schemas
Orchestration	Basic cron jobs on a VM

Scale: ~250 GB/day new data, 120M PostgreSQL rows, 400M MongoDB documents, nightly batch window of 3 hours, dashboard freshness target of under 30 minutes after load completion.

Question

Scenario

Current State

Component	Status / Technology
Transactional source	PostgreSQL with normalized customer, subscription, and invoice tables
Operational event source	MongoDB with nested user profile and activity documents
Ingestion	Nightly Python scripts dumping CSV and JSON to Snowflake internal stages
Transformation	Ad hoc SQL scripts run manually in Snowflake worksheets
Warehouse	Snowflake raw and reporting schemas
Orchestration	Basic cron jobs on a VM

Scale: ~250 GB/day new data, 120M PostgreSQL rows, 400M MongoDB documents, nightly batch window of 3 hours, dashboard freshness target of under 30 minutes after load completion.

Interview Guides

Scenario

Current State

Question

Unify Relational and Document Sources

Scenario

Current State

Question

Your Answer

Unify Relational and Document Sources

Scenario

Current State

Question

Unify Relational and Document Sources

Scenario

Current State

Question

Your Answer