Analytics datasets often contain missing values, blank strings, duplicate records, and inconsistent categories. If these issues are not handled carefully, reports and downstream metrics become unreliable.
Explain how you would handle NULL values and dirty data when preparing a dataset for analytics in SQL. Your answer should cover:
NULLs from other dirty values such as empty strings, placeholder text, or invalid numbersCOALESCE, NULLIF, CASE WHEN, and aggregates behave with missing dataKeep the discussion practical and SQL-focused. The interviewer expects you to explain both the reasoning and the trade-offs, not just list functions. Use short PostgreSQL examples where helpful.