Context
MTX Group runs large-scale public sector implementations on Salesforce, where case updates, citizen service requests, payments, and integration events can spike sharply during enrollment windows and emergency response periods. The current pattern relies on scheduled bulk extracts and point-to-point integrations, causing API contention, stale downstream data, and occasional duplicate writes back into Salesforce.
Design a pipeline architecture that can ingest, process, and reconcile high-volume Salesforce transactions while protecting Salesforce governor limits and delivering near-real-time data to analytics and downstream operational systems.
Scale Requirements
- Peak ingest: 120K Salesforce change events/minute across Case, Account, Contact, Opportunity, and custom objects
- Average volume: 35M record changes/day
- Payload size: 3-8 KB/event after enrichment
- Latency target: < 2 minutes from Salesforce change to curated warehouse table
- Writeback SLA: outbound updates to Salesforce completed within 5 minutes for 99% of records
- Retention: 13 months raw history, 7 years curated audit data
Requirements
- Design ingestion from Salesforce using Salesforce Change Data Capture, Bulk API 2.0 extracts, and MTX Group integration services for backfills and replay.
- Support both streaming and batch paths: streaming for operational freshness, batch for historical reprocessing and reconciliation.
- Implement idempotent processing for duplicate CDC events, retries, and replayed bulk loads.
- Enforce data quality checks: schema validation, required-field checks, referential integrity, and source-to-target reconciliation.
- Build a curated model for downstream reporting and operational APIs, including SCD handling for selected dimensions.
- Orchestrate dependencies, replay jobs, and failure recovery with clear run-state visibility.
- Design monitoring for Salesforce API consumption, consumer lag, freshness, and failed writebacks.
Constraints
- Salesforce API and event delivery limits must not be exceeded.
- Platform must run in an AWS environment already used by MTX Group delivery teams.
- PII must be encrypted in transit and at rest; auditability is required for public sector clients.
- Team is small: assume 5 engineers, so operational simplicity matters as much as throughput.