You need to backfill terabytes of historical data into a pipeline that is already processing live events. The key challenge is keeping downstream tables correct and queryable without pausing real-time ingestion or introducing duplicates, stale overwrites, or long freshness regressions.
Historical and live data may overlap on the same business keysBackfill jobs can compete with streaming jobs for compute and warehouse capacityLate historical records must not overwrite newer real-time state incorrectlyDownstream consumers still expect near-real-time freshnessBackfilling strategyStream processing designIdempotent writes and replay safetyData quality gates and reconciliationOrchestration of bounded and unbounded workloads