You are given a dataset with missing values, inconsistent formats, duplicate records, and obvious outliers. In a real reporting workflow, these issues can distort aggregates, joins, and downstream metrics if you do not handle them deliberately.
How do you approach a dataset that has significant missing or dirty data? Explain how you would identify the issues, decide what to keep or discard, and standardize values before analysis. Include how you would handle NULLs, invalid dates or numbers, duplicate rows, and inconsistent categorical values.
Keep your answer practical and SQL-focused. The interviewer expects you to discuss the order of operations, trade-offs between filtering and imputing, and how you would use SQL to profile and clean the data without silently biasing results.