
Missing data shows up constantly in product analysis, especially in event logs and profile fields on surfaces like Quora feed impressions, answers, and signup flows. Interviewers want to see that you can distinguish between truly missing values, valid zeroes, and rows that should be excluded.
Explain how you handle missing data in a SQL-based analysis. You should describe how you identify missing values, decide whether to filter, impute, or label them, and how functions like COALESCE, CASE WHEN, and aggregate functions affect the result. You should also explain how your approach changes depending on whether the missing field is a metric, dimension, or date.
Keep your answer practical rather than theoretical. The interviewer expects you to discuss common SQL patterns, trade-offs, and how missing data can bias product metrics if handled incorrectly.
In SQL, NULL means the value is unknown or missing, which is different from a real value like 0 or ''. Treating them as equivalent can distort metrics such as counts, averages, and conversion rates.
SELECT user_id, views, COALESCE(views, 0) AS views_filled
FROM quora_daily_activity;
You should not automatically fill all missing values. Sometimes the right choice is to exclude rows, sometimes to replace missing values with a default, and sometimes to keep them as a separate category so you can measure data quality explicitly.
SELECT
CASE WHEN country IS NULL THEN 'Unknown' ELSE country END AS country_group,
COUNT(*) AS users
FROM quora_users
GROUP BY country_group;
Most aggregate functions such as AVG, SUM, and COUNT(column) ignore NULL, while COUNT(*) counts rows regardless of missing values. This difference matters when you are calculating rates or summarizing incomplete event data.
SELECT
COUNT(*) AS total_rows,
COUNT(session_length_seconds) AS rows_with_session_length,
AVG(session_length_seconds) AS avg_session_length
FROM quora_sessions;
COALESCE is useful when you want a fallback value, but it should reflect a real business assumption. CASE WHEN is better when you need to separate missing values from valid values and make the logic explicit in the output.
SELECT
SUM(CASE WHEN upvotes IS NULL THEN 1 ELSE 0 END) AS missing_upvotes,
SUM(COALESCE(upvotes, 0)) AS total_upvotes_assuming_missing_is_zero
FROM quora_answer_stats;
A strong analysis compares results under different missing-data treatments and explains the implications. This helps you avoid presenting a metric that looks precise but is actually driven by incomplete or biased data.
SELECT
AVG(read_time_seconds) AS avg_excluding_nulls,
AVG(COALESCE(read_time_seconds, 0)) AS avg_treating_null_as_zero
FROM quora_feed_events;