You're working on a data platform and need to reload a large historical dataset without disrupting the live pipeline that powers downstream reporting and operational use cases. The challenge is balancing a one-time or periodic bulk backfill with continuous event processing, while keeping outputs correct and avoiding duplicate or conflicting writes.
How do you handle backfilling terabytes of data while maintaining real-time processing?