Redesign Failing Orders ETL Pipeline

Context

Northstar Retail, a mid-market e-commerce company, has an orders and fulfillment reporting pipeline built on nightly batch ETL. The current workflow uses Apache Airflow to orchestrate Python jobs that extract from PostgreSQL, payment APIs, and warehouse management exports, then load curated tables into Snowflake. The pipeline regularly misses the 7 a.m. business SLA, produces duplicate orders during retries, and cannot support same-day operational reporting.

You are asked to redesign the workflow so it meets business requirements for freshness, correctness, and recoverability without replacing the entire AWS-based platform.

Scale Requirements

Sources: PostgreSQL OLTP, Stripe API, S3 CSV exports from 3PL partners
Volume: 45M order line items/day, 1.2TB raw data/day
Peak ingest: 8K order events/sec during promotions
Latency target: curated operational tables available within 15 minutes of source updates
Batch SLA: full daily financial reconciliation completed by 6:30 a.m. UTC
Retention: 2 years curated data, 90 days raw landing zone

Requirements

Design a revised pipeline that supports both incremental near-real-time ingestion and daily reconciliation.
Ensure idempotent processing so retries do not create duplicate facts or double-count revenue.
Handle late-arriving updates such as refunds, cancellations, and shipment status changes.
Implement data quality checks for schema drift, null business keys, duplicate order IDs, and reconciliation mismatches.
Support historical backfills for 30 days without disrupting current production loads.
Define orchestration dependencies, retry behavior, and rollback/reprocessing strategy.
Expose analytics-ready tables for finance, operations, and customer support teams.

Constraints

Must stay on AWS + Snowflake; no full platform migration.
Team size: 3 data engineers, 1 analytics engineer.
Incremental infrastructure budget: <$18K/month.
PCI-sensitive payment fields cannot be stored in raw logs beyond tokenized values.
Source PostgreSQL cannot tolerate heavy read load or long-running extraction queries.

Context

You are asked to redesign the workflow so it meets business requirements for freshness, correctness, and recoverability without replacing the entire AWS-based platform.

Scale Requirements

Sources: PostgreSQL OLTP, Stripe API, S3 CSV exports from 3PL partners
Volume: 45M order line items/day, 1.2TB raw data/day
Peak ingest: 8K order events/sec during promotions
Latency target: curated operational tables available within 15 minutes of source updates
Batch SLA: full daily financial reconciliation completed by 6:30 a.m. UTC
Retention: 2 years curated data, 90 days raw landing zone

Requirements

Design a revised pipeline that supports both incremental near-real-time ingestion and daily reconciliation.
Ensure idempotent processing so retries do not create duplicate facts or double-count revenue.
Handle late-arriving updates such as refunds, cancellations, and shipment status changes.
Implement data quality checks for schema drift, null business keys, duplicate order IDs, and reconciliation mismatches.
Support historical backfills for 30 days without disrupting current production loads.
Define orchestration dependencies, retry behavior, and rollback/reprocessing strategy.
Expose analytics-ready tables for finance, operations, and customer support teams.

Constraints

Must stay on AWS + Snowflake; no full platform migration.
Team size: 3 data engineers, 1 analytics engineer.
Incremental infrastructure budget: <$18K/month.
PCI-sensitive payment fields cannot be stored in raw logs beyond tokenized values.
Source PostgreSQL cannot tolerate heavy read load or long-running extraction queries.

Context

You are asked to redesign the workflow so it meets business requirements for freshness, correctness, and recoverability without replacing the entire AWS-based platform.

Scale Requirements

Sources: PostgreSQL OLTP, Stripe API, S3 CSV exports from 3PL partners
Volume: 45M order line items/day, 1.2TB raw data/day
Peak ingest: 8K order events/sec during promotions
Latency target: curated operational tables available within 15 minutes of source updates
Batch SLA: full daily financial reconciliation completed by 6:30 a.m. UTC
Retention: 2 years curated data, 90 days raw landing zone

Requirements

Design a revised pipeline that supports both incremental near-real-time ingestion and daily reconciliation.
Ensure idempotent processing so retries do not create duplicate facts or double-count revenue.
Handle late-arriving updates such as refunds, cancellations, and shipment status changes.
Implement data quality checks for schema drift, null business keys, duplicate order IDs, and reconciliation mismatches.
Support historical backfills for 30 days without disrupting current production loads.
Define orchestration dependencies, retry behavior, and rollback/reprocessing strategy.
Expose analytics-ready tables for finance, operations, and customer support teams.

Constraints

Must stay on AWS + Snowflake; no full platform migration.
Team size: 3 data engineers, 1 analytics engineer.
Incremental infrastructure budget: <$18K/month.
PCI-sensitive payment fields cannot be stored in raw logs beyond tokenized values.
Source PostgreSQL cannot tolerate heavy read load or long-running extraction queries.

Context

You are asked to redesign the workflow so it meets business requirements for freshness, correctness, and recoverability without replacing the entire AWS-based platform.

Scale Requirements

Sources: PostgreSQL OLTP, Stripe API, S3 CSV exports from 3PL partners
Volume: 45M order line items/day, 1.2TB raw data/day
Peak ingest: 8K order events/sec during promotions
Latency target: curated operational tables available within 15 minutes of source updates
Batch SLA: full daily financial reconciliation completed by 6:30 a.m. UTC
Retention: 2 years curated data, 90 days raw landing zone

Requirements

Design a revised pipeline that supports both incremental near-real-time ingestion and daily reconciliation.
Ensure idempotent processing so retries do not create duplicate facts or double-count revenue.
Handle late-arriving updates such as refunds, cancellations, and shipment status changes.
Implement data quality checks for schema drift, null business keys, duplicate order IDs, and reconciliation mismatches.
Support historical backfills for 30 days without disrupting current production loads.
Define orchestration dependencies, retry behavior, and rollback/reprocessing strategy.
Expose analytics-ready tables for finance, operations, and customer support teams.

Constraints

Must stay on AWS + Snowflake; no full platform migration.
Team size: 3 data engineers, 1 analytics engineer.
Incremental infrastructure budget: <$18K/month.
PCI-sensitive payment fields cannot be stored in raw logs beyond tokenized values.
Source PostgreSQL cannot tolerate heavy read load or long-running extraction queries.

Interview Guides

Context

Scale Requirements

Requirements

Constraints

Redesign Failing Orders ETL Pipeline

Context

Scale Requirements

Requirements

Constraints

Your Answer

Redesign Failing Orders ETL Pipeline

Context

Scale Requirements

Requirements

Constraints

Redesign Failing Orders ETL Pipeline

Context

Scale Requirements

Requirements

Constraints

Your Answer