Modernize Marketing ETL Operations

Context

BrightCart, a mid-sized e-commerce company, runs daily marketing and sales reporting from Shopify, Google Ads, Meta Ads, and a PostgreSQL order database. Today, analysts manually trigger scripts and rerun failed jobs, causing inconsistent dashboards and delayed reporting.

You need to design an operationally efficient data pipeline platform that standardizes ingestion, transformation, scheduling, and monitoring. The goal is to reduce manual intervention and provide reliable daily and hourly datasets for finance and growth teams.

Scale Requirements

Sources: 4 primary systems, expanding to 12 within 12 months
Batch volume: ~250 GB/day raw data
Records: ~180M rows/day across all sources
Latency: hourly refresh for ad spend, daily refresh for finance tables
Retention: 2 years in warehouse, 90 days raw landing storage
SLA: 99.5% successful scheduled pipeline runs per month

Requirements

Design a pipeline that ingests API and database data into a centralized platform with minimal manual operations.
Orchestrate dependencies between ingestion, staging, transformation, and data quality checks.
Support incremental loads, idempotent reruns, and backfills for the last 90 days.
Implement validation for schema drift, null spikes, duplicate records, and row-count anomalies.
Expose curated tables for BI dashboards and downstream analysts.
Define monitoring, alerting, and operational ownership for failures and SLA breaches.
Include how you would choose and use tools to manage operations effectively across scheduling, lineage, logging, and recovery.

Constraints

Existing cloud footprint is AWS.
Team size is 3 data engineers and 1 analytics engineer.
Budget should avoid large always-on clusters.
PII from orders and customer tables must be encrypted at rest and access-controlled.
Preference for managed services where operational overhead is lower.

Context

Scale Requirements

Sources: 4 primary systems, expanding to 12 within 12 months
Batch volume: ~250 GB/day raw data
Records: ~180M rows/day across all sources
Latency: hourly refresh for ad spend, daily refresh for finance tables
Retention: 2 years in warehouse, 90 days raw landing storage
SLA: 99.5% successful scheduled pipeline runs per month

Requirements

Design a pipeline that ingests API and database data into a centralized platform with minimal manual operations.
Orchestrate dependencies between ingestion, staging, transformation, and data quality checks.
Support incremental loads, idempotent reruns, and backfills for the last 90 days.
Implement validation for schema drift, null spikes, duplicate records, and row-count anomalies.
Expose curated tables for BI dashboards and downstream analysts.
Define monitoring, alerting, and operational ownership for failures and SLA breaches.
Include how you would choose and use tools to manage operations effectively across scheduling, lineage, logging, and recovery.

Constraints

Existing cloud footprint is AWS.
Team size is 3 data engineers and 1 analytics engineer.
Budget should avoid large always-on clusters.
PII from orders and customer tables must be encrypted at rest and access-controlled.
Preference for managed services where operational overhead is lower.

Context

Scale Requirements

Sources: 4 primary systems, expanding to 12 within 12 months
Batch volume: ~250 GB/day raw data
Records: ~180M rows/day across all sources
Latency: hourly refresh for ad spend, daily refresh for finance tables
Retention: 2 years in warehouse, 90 days raw landing storage
SLA: 99.5% successful scheduled pipeline runs per month

Requirements

Design a pipeline that ingests API and database data into a centralized platform with minimal manual operations.
Orchestrate dependencies between ingestion, staging, transformation, and data quality checks.
Support incremental loads, idempotent reruns, and backfills for the last 90 days.
Implement validation for schema drift, null spikes, duplicate records, and row-count anomalies.
Expose curated tables for BI dashboards and downstream analysts.
Define monitoring, alerting, and operational ownership for failures and SLA breaches.
Include how you would choose and use tools to manage operations effectively across scheduling, lineage, logging, and recovery.

Constraints

Existing cloud footprint is AWS.
Team size is 3 data engineers and 1 analytics engineer.
Budget should avoid large always-on clusters.
PII from orders and customer tables must be encrypted at rest and access-controlled.
Preference for managed services where operational overhead is lower.

Context

Scale Requirements

Sources: 4 primary systems, expanding to 12 within 12 months
Batch volume: ~250 GB/day raw data
Records: ~180M rows/day across all sources
Latency: hourly refresh for ad spend, daily refresh for finance tables
Retention: 2 years in warehouse, 90 days raw landing storage
SLA: 99.5% successful scheduled pipeline runs per month

Requirements

Design a pipeline that ingests API and database data into a centralized platform with minimal manual operations.
Orchestrate dependencies between ingestion, staging, transformation, and data quality checks.
Support incremental loads, idempotent reruns, and backfills for the last 90 days.
Implement validation for schema drift, null spikes, duplicate records, and row-count anomalies.
Expose curated tables for BI dashboards and downstream analysts.
Define monitoring, alerting, and operational ownership for failures and SLA breaches.
Include how you would choose and use tools to manage operations effectively across scheduling, lineage, logging, and recovery.

Constraints

Existing cloud footprint is AWS.
Team size is 3 data engineers and 1 analytics engineer.
Budget should avoid large always-on clusters.
PII from orders and customer tables must be encrypted at rest and access-controlled.
Preference for managed services where operational overhead is lower.

Interview Guides

Context

Scale Requirements

Requirements

Constraints

Modernize Marketing ETL Operations

Context

Scale Requirements

Requirements

Constraints

Your Answer

Modernize Marketing ETL Operations

Context

Scale Requirements

Requirements

Constraints

Modernize Marketing ETL Operations

Context

Scale Requirements

Requirements

Constraints

Your Answer