Migrate Legacy ETL to ELT

Context

FinSight, a B2B fintech analytics company, currently uses nightly ETL jobs to extract data from PostgreSQL, Salesforce, and Stripe, transform it on Apache Spark, and load curated tables into Snowflake. The platform now supports self-serve analytics for 300 internal users, and the current ETL design is slowing schema changes, increasing pipeline maintenance, and delaying access to raw source data.

Your task is to design a modernized pipeline and explain when ETL vs ELT is appropriate. The target architecture should preserve existing batch SLAs while shifting most transformations into Snowflake for faster iteration and better lineage.

Scale Requirements

Sources: PostgreSQL OLTP (~2 TB), Salesforce (~150 GB), Stripe API (~80 GB/month)
Daily ingest volume: ~450 GB/day incremental, ~12M changed rows/day
Latency target: raw data available in warehouse within 15 minutes of extraction; business marts refreshed hourly
Retention: raw history for 2 years, curated marts for 5 years
Concurrency: 300 BI users, 40 scheduled dashboard queries, 25 dbt jobs/hour

Requirements

Design a batch pipeline that lands source data with minimal transformation and supports both ETL and ELT patterns.
Show how raw, staging, and mart layers are modeled in Snowflake.
Define how incremental loads, idempotent reruns, and backfills are handled.
Include data quality checks for schema drift, null spikes, duplicate primary keys, and row-count anomalies.
Explain which transformations should remain pre-load (ETL) versus post-load in-warehouse (ELT).
Describe orchestration, monitoring, and failure recovery for hourly and nightly jobs.

Constraints

Existing stack is AWS-based and Snowflake is already the central warehouse.
Team size is 3 data engineers; operational simplicity matters more than perfect real-time freshness.
Budget increase is capped at $15K/month.
PII must be masked in analytics schemas and deleted within 30 days of verified requests.

Context

Scale Requirements

Sources: PostgreSQL OLTP (~2 TB), Salesforce (~150 GB), Stripe API (~80 GB/month)
Daily ingest volume: ~450 GB/day incremental, ~12M changed rows/day
Latency target: raw data available in warehouse within 15 minutes of extraction; business marts refreshed hourly
Retention: raw history for 2 years, curated marts for 5 years
Concurrency: 300 BI users, 40 scheduled dashboard queries, 25 dbt jobs/hour

Requirements

Design a batch pipeline that lands source data with minimal transformation and supports both ETL and ELT patterns.
Show how raw, staging, and mart layers are modeled in Snowflake.
Define how incremental loads, idempotent reruns, and backfills are handled.
Include data quality checks for schema drift, null spikes, duplicate primary keys, and row-count anomalies.
Explain which transformations should remain pre-load (ETL) versus post-load in-warehouse (ELT).
Describe orchestration, monitoring, and failure recovery for hourly and nightly jobs.

Constraints

Existing stack is AWS-based and Snowflake is already the central warehouse.
Team size is 3 data engineers; operational simplicity matters more than perfect real-time freshness.
Budget increase is capped at $15K/month.
PII must be masked in analytics schemas and deleted within 30 days of verified requests.

Context

Scale Requirements

Sources: PostgreSQL OLTP (~2 TB), Salesforce (~150 GB), Stripe API (~80 GB/month)
Daily ingest volume: ~450 GB/day incremental, ~12M changed rows/day
Latency target: raw data available in warehouse within 15 minutes of extraction; business marts refreshed hourly
Retention: raw history for 2 years, curated marts for 5 years
Concurrency: 300 BI users, 40 scheduled dashboard queries, 25 dbt jobs/hour

Requirements

Design a batch pipeline that lands source data with minimal transformation and supports both ETL and ELT patterns.
Show how raw, staging, and mart layers are modeled in Snowflake.
Define how incremental loads, idempotent reruns, and backfills are handled.
Include data quality checks for schema drift, null spikes, duplicate primary keys, and row-count anomalies.
Explain which transformations should remain pre-load (ETL) versus post-load in-warehouse (ELT).
Describe orchestration, monitoring, and failure recovery for hourly and nightly jobs.

Constraints

Existing stack is AWS-based and Snowflake is already the central warehouse.
Team size is 3 data engineers; operational simplicity matters more than perfect real-time freshness.
Budget increase is capped at $15K/month.
PII must be masked in analytics schemas and deleted within 30 days of verified requests.

Context

Scale Requirements

Sources: PostgreSQL OLTP (~2 TB), Salesforce (~150 GB), Stripe API (~80 GB/month)
Daily ingest volume: ~450 GB/day incremental, ~12M changed rows/day
Latency target: raw data available in warehouse within 15 minutes of extraction; business marts refreshed hourly
Retention: raw history for 2 years, curated marts for 5 years
Concurrency: 300 BI users, 40 scheduled dashboard queries, 25 dbt jobs/hour

Requirements

Design a batch pipeline that lands source data with minimal transformation and supports both ETL and ELT patterns.
Show how raw, staging, and mart layers are modeled in Snowflake.
Define how incremental loads, idempotent reruns, and backfills are handled.
Include data quality checks for schema drift, null spikes, duplicate primary keys, and row-count anomalies.
Explain which transformations should remain pre-load (ETL) versus post-load in-warehouse (ELT).
Describe orchestration, monitoring, and failure recovery for hourly and nightly jobs.

Constraints

Existing stack is AWS-based and Snowflake is already the central warehouse.
Team size is 3 data engineers; operational simplicity matters more than perfect real-time freshness.
Budget increase is capped at $15K/month.
PII must be masked in analytics schemas and deleted within 30 days of verified requests.

Interview Guides

Context

Scale Requirements

Requirements

Constraints

Migrate Legacy ETL to ELT

Context

Scale Requirements

Requirements

Constraints

Your Answer

Migrate Legacy ETL to ELT

Context

Scale Requirements

Requirements

Constraints

Migrate Legacy ETL to ELT

Context

Scale Requirements

Requirements

Constraints

Your Answer