Context
AcmeCloud uses Marketo for lead capture and Salesforce Sales Cloud as the system of record for accounts, contacts, and opportunities. A managed sync that normally propagates lead updates every 5 minutes has started failing intermittently, causing stale CRM records and broken downstream reporting in Snowflake.
Today, operational data lands in Salesforce and Marketo first, then is replicated into Snowflake through Fivetran and transformed with dbt. The interview problem is to design how you would triage the failure, isolate the first likely fault domain, and build a resilient pipeline and monitoring layer so the sync can recover safely without creating duplicates or data loss.
Scale Requirements
- Lead updates: 1.2M/day average, 4M/day peak during campaigns
- Sync frequency: every 5 minutes
- Batch size: up to 25K records/run
- Latency target: Marketo update visible in Salesforce within 15 minutes
- Warehouse volume: 2 TB historical CRM + marketing data in Snowflake
- Recovery target: replay failed sync windows within 2 hours
Requirements
- Identify the first checks you would perform when the Marketo-to-Salesforce sync starts failing.
- Design a pipeline that captures sync status, API errors, rejected records, and replayable change events.
- Ensure idempotent upserts into Salesforce using stable business keys such as
lead_id, email, or external IDs.
- Detect common failure modes: expired credentials, API quota exhaustion, schema drift, field-level validation failures, and dependency outages.
- Support selective backfills for failed windows without reprocessing successful records.
- Expose operational dashboards for sync health, backlog, rejection rate, and end-to-end freshness.
- Preserve an audit trail for every attempted sync and record-level outcome.
Constraints
- Existing stack is AWS + Snowflake + Airflow; avoid introducing heavy new infrastructure.
- Salesforce API quota is limited and shared with other internal tools.
- PII must be masked in logs and retained under SOC 2 controls.
- Team size is 3 data engineers and 1 RevOps engineer; solution should be operable by non-platform specialists.