You're working on a data pipeline where source data can arrive with missing fields, duplicate records, or conflicting values across systems. You want a practical approach for keeping downstream reporting usable while preserving traceability back to the raw inputs.
How would you handle a dataset that is incomplete or inconsistent?