Interview Guides

Integrate CRM, Billing, and Support

Easy

Pipelines

Context

AcmeCloud, a B2B SaaS company, needs a unified customer data pipeline across Salesforce (CRM), Stripe (billing), and Zendesk (support). Today, each system is queried independently and analysts manually reconcile account IDs, causing inconsistent revenue, churn, and support-health reporting.

You need to design a production-grade pipeline that integrates these disparate systems into a single analytics model in Snowflake. The company wants both scheduled batch ingestion for full-system consistency and near-real-time updates for key customer events.

Scale Requirements

Sources: Salesforce (~8M accounts/contacts records), Stripe (~120M charge/invoice/payment events), Zendesk (~45M tickets/comments)
Change volume: ~2M source updates/day, with 150 events/sec peak from webhooks
Latency targets: <15 minutes for webhook-driven updates, <4 hours for daily full reconciliation
Storage: ~12 TB raw historical data, growing 400 GB/month
Consumers: 60 BI users, 8 downstream data products, 3 finance dashboards

Requirements

Ingest data from REST APIs, bulk exports, and webhooks from all three systems.
Normalize disparate schemas and map records to a canonical customer_id despite missing or conflicting identifiers.
Support both incremental loads and backfills without creating duplicates.
Build analytics-ready tables for customer 360, MRR, failed payments, open support escalations, and churn risk inputs.
Implement data quality checks for schema drift, null keys, duplicate events, and referential integrity.
Orchestrate dependencies so raw ingestion, staging transforms, and marts run reliably and can be replayed.
Expose lineage, freshness, and load status for stakeholders.

Constraints

AWS is the required cloud; Snowflake is already the warehouse.
Team size is 3 data engineers; operational complexity should stay moderate.
Budget for new infrastructure is limited to $18K/month.
Compliance: PII must be encrypted at rest, and deletion requests must propagate within 72 hours.

Integrate CRM, Billing, and Support

Easy

Pipelines

Context

Scale Requirements

Sources: Salesforce (~8M accounts/contacts records), Stripe (~120M charge/invoice/payment events), Zendesk (~45M tickets/comments)
Change volume: ~2M source updates/day, with 150 events/sec peak from webhooks
Latency targets: <15 minutes for webhook-driven updates, <4 hours for daily full reconciliation
Storage: ~12 TB raw historical data, growing 400 GB/month
Consumers: 60 BI users, 8 downstream data products, 3 finance dashboards

Requirements

Ingest data from REST APIs, bulk exports, and webhooks from all three systems.
Normalize disparate schemas and map records to a canonical customer_id despite missing or conflicting identifiers.
Support both incremental loads and backfills without creating duplicates.
Build analytics-ready tables for customer 360, MRR, failed payments, open support escalations, and churn risk inputs.
Implement data quality checks for schema drift, null keys, duplicate events, and referential integrity.
Orchestrate dependencies so raw ingestion, staging transforms, and marts run reliably and can be replayed.
Expose lineage, freshness, and load status for stakeholders.

Constraints

AWS is the required cloud; Snowflake is already the warehouse.
Team size is 3 data engineers; operational complexity should stay moderate.
Budget for new infrastructure is limited to $18K/month.
Compliance: PII must be encrypted at rest, and deletion requests must propagate within 72 hours.

Your Answer

Integrate CRM, Billing, and Support

Easy

Pipelines

Context

Scale Requirements

Sources: Salesforce (~8M accounts/contacts records), Stripe (~120M charge/invoice/payment events), Zendesk (~45M tickets/comments)
Change volume: ~2M source updates/day, with 150 events/sec peak from webhooks
Latency targets: <15 minutes for webhook-driven updates, <4 hours for daily full reconciliation
Storage: ~12 TB raw historical data, growing 400 GB/month
Consumers: 60 BI users, 8 downstream data products, 3 finance dashboards

Requirements

Ingest data from REST APIs, bulk exports, and webhooks from all three systems.
Normalize disparate schemas and map records to a canonical customer_id despite missing or conflicting identifiers.
Support both incremental loads and backfills without creating duplicates.
Build analytics-ready tables for customer 360, MRR, failed payments, open support escalations, and churn risk inputs.
Implement data quality checks for schema drift, null keys, duplicate events, and referential integrity.
Orchestrate dependencies so raw ingestion, staging transforms, and marts run reliably and can be replayed.
Expose lineage, freshness, and load status for stakeholders.

Constraints

AWS is the required cloud; Snowflake is already the warehouse.
Team size is 3 data engineers; operational complexity should stay moderate.
Budget for new infrastructure is limited to $18K/month.
Compliance: PII must be encrypted at rest, and deletion requests must propagate within 72 hours.

Integrate CRM, Billing, and Support

Easy

Pipelines

Context

Scale Requirements

Sources: Salesforce (~8M accounts/contacts records), Stripe (~120M charge/invoice/payment events), Zendesk (~45M tickets/comments)
Change volume: ~2M source updates/day, with 150 events/sec peak from webhooks
Latency targets: <15 minutes for webhook-driven updates, <4 hours for daily full reconciliation
Storage: ~12 TB raw historical data, growing 400 GB/month
Consumers: 60 BI users, 8 downstream data products, 3 finance dashboards

Requirements

Ingest data from REST APIs, bulk exports, and webhooks from all three systems.
Normalize disparate schemas and map records to a canonical customer_id despite missing or conflicting identifiers.
Support both incremental loads and backfills without creating duplicates.
Build analytics-ready tables for customer 360, MRR, failed payments, open support escalations, and churn risk inputs.
Implement data quality checks for schema drift, null keys, duplicate events, and referential integrity.
Orchestrate dependencies so raw ingestion, staging transforms, and marts run reliably and can be replayed.
Expose lineage, freshness, and load status for stakeholders.

Constraints

AWS is the required cloud; Snowflake is already the warehouse.
Team size is 3 data engineers; operational complexity should stay moderate.
Budget for new infrastructure is limited to $18K/month.
Compliance: PII must be encrypted at rest, and deletion requests must propagate within 72 hours.