Monitor Daily ETL Pipeline Performance

Context

FinSight, a personal finance analytics company, runs daily ETL pipelines that ingest transaction, account, and customer data from PostgreSQL, Stripe, and S3 into Snowflake for internal reporting. The current Airflow-based pipeline completes inconsistently, and the data team lacks clear visibility into runtime regressions, failed tasks, stale tables, and data quality issues.

You are asked to design a monitoring strategy for this pipeline so engineers can quickly detect performance degradation, data freshness problems, and quality failures before business users see incorrect dashboards.

Scale Requirements

Sources: 3 primary sources (PostgreSQL OLTP, Stripe API, S3 partner files)
Volume: 250M rows/day, ~1.2 TB raw data/day
Pipeline cadence: Daily full load for some tables, hourly incremental loads for high-value fact tables
Latency target: Hourly tables available in Snowflake within 20 minutes of scheduled start
SLA: 99.5% successful scheduled runs per month
Retention: 180 days of raw logs and 13 months of warehouse history

Requirements

Define how to monitor pipeline performance across ingestion, transformation, and load stages.
Track task runtime, queue delays, retry counts, row counts, and end-to-end freshness.
Detect data quality issues such as null spikes, duplicate primary keys, and schema drift.
Propose alerting thresholds and escalation paths for failures and slowdowns.
Show how monitoring integrates with Airflow, dbt, and Snowflake.
Include a lightweight approach for historical trend analysis and SLA reporting.

Constraints

Existing stack is AWS + Snowflake; avoid introducing a large new platform.
Team size is 3 data engineers; operational overhead must stay low.
Budget allows managed monitoring tools but not a full observability rebuild.
PCI-related financial data requires auditability of failures, retries, and manual reruns.

Context

Scale Requirements

Sources: 3 primary sources (PostgreSQL OLTP, Stripe API, S3 partner files)
Volume: 250M rows/day, ~1.2 TB raw data/day
Pipeline cadence: Daily full load for some tables, hourly incremental loads for high-value fact tables
Latency target: Hourly tables available in Snowflake within 20 minutes of scheduled start
SLA: 99.5% successful scheduled runs per month
Retention: 180 days of raw logs and 13 months of warehouse history

Requirements

Define how to monitor pipeline performance across ingestion, transformation, and load stages.
Track task runtime, queue delays, retry counts, row counts, and end-to-end freshness.
Detect data quality issues such as null spikes, duplicate primary keys, and schema drift.
Propose alerting thresholds and escalation paths for failures and slowdowns.
Show how monitoring integrates with Airflow, dbt, and Snowflake.
Include a lightweight approach for historical trend analysis and SLA reporting.

Constraints

Existing stack is AWS + Snowflake; avoid introducing a large new platform.
Team size is 3 data engineers; operational overhead must stay low.
Budget allows managed monitoring tools but not a full observability rebuild.
PCI-related financial data requires auditability of failures, retries, and manual reruns.

Context

Scale Requirements

Sources: 3 primary sources (PostgreSQL OLTP, Stripe API, S3 partner files)
Volume: 250M rows/day, ~1.2 TB raw data/day
Pipeline cadence: Daily full load for some tables, hourly incremental loads for high-value fact tables
Latency target: Hourly tables available in Snowflake within 20 minutes of scheduled start
SLA: 99.5% successful scheduled runs per month
Retention: 180 days of raw logs and 13 months of warehouse history

Requirements

Define how to monitor pipeline performance across ingestion, transformation, and load stages.
Track task runtime, queue delays, retry counts, row counts, and end-to-end freshness.
Detect data quality issues such as null spikes, duplicate primary keys, and schema drift.
Propose alerting thresholds and escalation paths for failures and slowdowns.
Show how monitoring integrates with Airflow, dbt, and Snowflake.
Include a lightweight approach for historical trend analysis and SLA reporting.

Constraints

Existing stack is AWS + Snowflake; avoid introducing a large new platform.
Team size is 3 data engineers; operational overhead must stay low.
Budget allows managed monitoring tools but not a full observability rebuild.
PCI-related financial data requires auditability of failures, retries, and manual reruns.

Context

Scale Requirements

Sources: 3 primary sources (PostgreSQL OLTP, Stripe API, S3 partner files)
Volume: 250M rows/day, ~1.2 TB raw data/day
Pipeline cadence: Daily full load for some tables, hourly incremental loads for high-value fact tables
Latency target: Hourly tables available in Snowflake within 20 minutes of scheduled start
SLA: 99.5% successful scheduled runs per month
Retention: 180 days of raw logs and 13 months of warehouse history

Requirements

Define how to monitor pipeline performance across ingestion, transformation, and load stages.
Track task runtime, queue delays, retry counts, row counts, and end-to-end freshness.
Detect data quality issues such as null spikes, duplicate primary keys, and schema drift.
Propose alerting thresholds and escalation paths for failures and slowdowns.
Show how monitoring integrates with Airflow, dbt, and Snowflake.
Include a lightweight approach for historical trend analysis and SLA reporting.

Constraints

Existing stack is AWS + Snowflake; avoid introducing a large new platform.
Team size is 3 data engineers; operational overhead must stay low.
Budget allows managed monitoring tools but not a full observability rebuild.
PCI-related financial data requires auditability of failures, retries, and manual reruns.

Interview Guides

Context

Scale Requirements

Requirements

Constraints

Monitor Daily ETL Pipeline Performance

Context

Scale Requirements

Requirements

Constraints

Your Answer

Monitor Daily ETL Pipeline Performance

Context

Scale Requirements

Requirements

Constraints

Monitor Daily ETL Pipeline Performance

Context

Scale Requirements

Requirements

Constraints

Your Answer