Choose Tools for Client ETL

Context

Northstar Data Consulting is implementing a new analytics platform for a mid-market retail client. The client currently uses ad hoc Python scripts, manual CSV uploads, and a legacy SQL Server reporting database; delivery is slow, brittle, and difficult to support across multiple consulting engagements.

You are asked to design a repeatable tool-selection approach and target pipeline architecture that your consulting team can deploy for this client and reuse for similar implementations.

Scale Requirements

Sources: Shopify, NetSuite, Salesforce, PostgreSQL, and SFTP-delivered CSV files
Batch volume: 250 GB/day raw ingest, growing 20% YoY
Tables/files: ~1,200 source objects, 150 business-critical datasets
Freshness: finance data every 4 hours; sales and inventory data every 15 minutes
Users: 80 BI users, 12 analysts, 4 data engineers
Retention: 2 years hot storage, 7 years archived for audit

Requirements

Propose how you would choose tools for ingestion, transformation, orchestration, storage, and data quality in a consulting implementation.
Design a pipeline that supports both ELT for SaaS/database sources and ETL for messy file-based feeds.
Define criteria for build-vs-buy decisions, including implementation speed, maintainability, client skill set, observability, and total cost of ownership.
Ensure pipelines are idempotent, support backfills, and can onboard a new source in less than 3 days.
Include a monitoring and alerting strategy for failed loads, schema drift, freshness SLA breaches, and data quality regressions.
Describe how your design would standardize delivery across clients while allowing client-specific customization.

Constraints

Client prefers AWS and already has S3 and Redshift contracts
Incremental tooling budget is capped at $12K/month
Small support team after handoff: 2 client data engineers
SOX-related auditability is required for finance datasets
Consulting team must minimize custom code and avoid tools requiring deep platform specialization

Context

You are asked to design a repeatable tool-selection approach and target pipeline architecture that your consulting team can deploy for this client and reuse for similar implementations.

Scale Requirements

Sources: Shopify, NetSuite, Salesforce, PostgreSQL, and SFTP-delivered CSV files
Batch volume: 250 GB/day raw ingest, growing 20% YoY
Tables/files: ~1,200 source objects, 150 business-critical datasets
Freshness: finance data every 4 hours; sales and inventory data every 15 minutes
Users: 80 BI users, 12 analysts, 4 data engineers
Retention: 2 years hot storage, 7 years archived for audit

Requirements

Propose how you would choose tools for ingestion, transformation, orchestration, storage, and data quality in a consulting implementation.
Design a pipeline that supports both ELT for SaaS/database sources and ETL for messy file-based feeds.
Define criteria for build-vs-buy decisions, including implementation speed, maintainability, client skill set, observability, and total cost of ownership.
Ensure pipelines are idempotent, support backfills, and can onboard a new source in less than 3 days.
Include a monitoring and alerting strategy for failed loads, schema drift, freshness SLA breaches, and data quality regressions.
Describe how your design would standardize delivery across clients while allowing client-specific customization.

Constraints

Client prefers AWS and already has S3 and Redshift contracts
Incremental tooling budget is capped at $12K/month
Small support team after handoff: 2 client data engineers
SOX-related auditability is required for finance datasets
Consulting team must minimize custom code and avoid tools requiring deep platform specialization

Context

You are asked to design a repeatable tool-selection approach and target pipeline architecture that your consulting team can deploy for this client and reuse for similar implementations.

Scale Requirements

Sources: Shopify, NetSuite, Salesforce, PostgreSQL, and SFTP-delivered CSV files
Batch volume: 250 GB/day raw ingest, growing 20% YoY
Tables/files: ~1,200 source objects, 150 business-critical datasets
Freshness: finance data every 4 hours; sales and inventory data every 15 minutes
Users: 80 BI users, 12 analysts, 4 data engineers
Retention: 2 years hot storage, 7 years archived for audit

Requirements

Propose how you would choose tools for ingestion, transformation, orchestration, storage, and data quality in a consulting implementation.
Design a pipeline that supports both ELT for SaaS/database sources and ETL for messy file-based feeds.
Define criteria for build-vs-buy decisions, including implementation speed, maintainability, client skill set, observability, and total cost of ownership.
Ensure pipelines are idempotent, support backfills, and can onboard a new source in less than 3 days.
Include a monitoring and alerting strategy for failed loads, schema drift, freshness SLA breaches, and data quality regressions.
Describe how your design would standardize delivery across clients while allowing client-specific customization.

Constraints

Client prefers AWS and already has S3 and Redshift contracts
Incremental tooling budget is capped at $12K/month
Small support team after handoff: 2 client data engineers
SOX-related auditability is required for finance datasets
Consulting team must minimize custom code and avoid tools requiring deep platform specialization

Context

You are asked to design a repeatable tool-selection approach and target pipeline architecture that your consulting team can deploy for this client and reuse for similar implementations.

Scale Requirements

Sources: Shopify, NetSuite, Salesforce, PostgreSQL, and SFTP-delivered CSV files
Batch volume: 250 GB/day raw ingest, growing 20% YoY
Tables/files: ~1,200 source objects, 150 business-critical datasets
Freshness: finance data every 4 hours; sales and inventory data every 15 minutes
Users: 80 BI users, 12 analysts, 4 data engineers
Retention: 2 years hot storage, 7 years archived for audit

Requirements

Propose how you would choose tools for ingestion, transformation, orchestration, storage, and data quality in a consulting implementation.
Design a pipeline that supports both ELT for SaaS/database sources and ETL for messy file-based feeds.
Define criteria for build-vs-buy decisions, including implementation speed, maintainability, client skill set, observability, and total cost of ownership.
Ensure pipelines are idempotent, support backfills, and can onboard a new source in less than 3 days.
Include a monitoring and alerting strategy for failed loads, schema drift, freshness SLA breaches, and data quality regressions.
Describe how your design would standardize delivery across clients while allowing client-specific customization.

Constraints

Client prefers AWS and already has S3 and Redshift contracts
Incremental tooling budget is capped at $12K/month
Small support team after handoff: 2 client data engineers
SOX-related auditability is required for finance datasets
Consulting team must minimize custom code and avoid tools requiring deep platform specialization

Interview Guides

Context

Scale Requirements

Requirements

Constraints

Choose Tools for Client ETL

Context

Scale Requirements

Requirements

Constraints

Your Answer

Choose Tools for Client ETL

Context

Scale Requirements

Requirements

Constraints

Choose Tools for Client ETL

Context

Scale Requirements

Requirements

Constraints

Your Answer