Interview Guides

Handling Missing Values in SQL | Dataford Interview Questions - Dataford - Ace your Interview

Handling Missing Values in SQL

Easy

SQL & Data Manipulation

Asked at 1 company1ETLData WranglingQuality

Also asked at

Problem

Context

In Capital One analytics workflows, especially when working with transaction, customer, or credit performance data in Snowflake- or PostgreSQL-backed reporting layers, missing values can materially change downstream metrics. Interviewers ask this to assess whether you can distinguish between data cleaning, business logic, and metric integrity.

Core Question

Explain how you would handle a dataset with missing values using SQL. Your answer should cover:

How you identify missing data (NULLs, blanks, placeholder values)
When to filter rows out versus impute values
How functions like COALESCE and CASE WHEN help in analysis
Risks of handling missing values incorrectly in aggregates and reporting

Scope Guidance

Keep the discussion practical and SQL-focused. The interviewer is not looking for advanced statistical imputation; they want a clear framework for profiling missingness, choosing an appropriate treatment based on business meaning, and implementing that logic safely in PostgreSQL.

Key Concepts

Profiling missing values

Before changing data, quantify how much is missing and in which columns. In SQL, this usually means counting NULLs, blank strings, and known placeholder values so you understand whether the issue is isolated or systemic.

SELECT
  COUNT(*) AS total_rows,
  COUNT(*) FILTER (WHERE income IS NULL) AS null_income_rows,
  COUNT(*) FILTER (WHERE TRIM(employment_status) = '') AS blank_status_rows
FROM customer_applications;

Different missing-value types

Not all missing values are stored as NULL. Some datasets use empty strings, zeros, or codes like 'Unknown' or 'N/A', and those should often be standardized before analysis.

SELECT
  CASE
    WHEN employer_name IS NULL THEN 'NULL'
    WHEN TRIM(employer_name) = '' THEN 'blank'
    WHEN employer_name IN ('Unknown', 'N/A') THEN 'placeholder'
    ELSE 'valid'
  END AS employer_name_status,
  COUNT(*)
FROM customer_applications
GROUP BY employer_name_status;

Filtering vs imputation

You should only fill missing values when there is a defensible business rule. If a field is required for a metric, excluding incomplete rows may be more accurate than defaulting to zero and biasing the result.

SELECT AVG(monthly_income)
FROM customer_applications
WHERE monthly_income IS NOT NULL;

Using COALESCE and CASE WHEN safely

COALESCE is useful for replacing NULL values at query time, but the replacement must reflect business meaning. CASE WHEN is better when different missing-value patterns need different treatment or labeling.

SELECT
  application_id,
  COALESCE(reported_bonus, 0) AS reported_bonus,
  CASE
    WHEN debt_to_income_ratio IS NULL THEN 'missing'
    WHEN debt_to_income_ratio > 0.4 THEN 'high'
    ELSE 'normal'
  END AS dti_bucket
FROM customer_applications;

Aggregate distortion

Missing values affect metrics differently depending on the function. AVG(column) ignores NULLs, while replacing NULL with 0 changes the denominator and can materially understate or overstate a financial KPI.

SELECT
  AVG(monthly_income) AS avg_excluding_nulls,
  AVG(COALESCE(monthly_income, 0)) AS avg_treating_missing_as_zero
FROM customer_applications;

Problem

Context

Core Question

Explain how you would handle a dataset with missing values using SQL. Your answer should cover:

How you identify missing data (NULLs, blanks, placeholder values)
When to filter rows out versus impute values
How functions like COALESCE and CASE WHEN help in analysis
Risks of handling missing values incorrectly in aggregates and reporting

Scope Guidance

Key Concepts

Profiling missing values

SELECT
  COUNT(*) AS total_rows,
  COUNT(*) FILTER (WHERE income IS NULL) AS null_income_rows,
  COUNT(*) FILTER (WHERE TRIM(employment_status) = '') AS blank_status_rows
FROM customer_applications;

Different missing-value types

Not all missing values are stored as NULL. Some datasets use empty strings, zeros, or codes like 'Unknown' or 'N/A', and those should often be standardized before analysis.

SELECT
  CASE
    WHEN employer_name IS NULL THEN 'NULL'
    WHEN TRIM(employer_name) = '' THEN 'blank'
    WHEN employer_name IN ('Unknown', 'N/A') THEN 'placeholder'
    ELSE 'valid'
  END AS employer_name_status,
  COUNT(*)
FROM customer_applications
GROUP BY employer_name_status;

Filtering vs imputation

SELECT AVG(monthly_income)
FROM customer_applications
WHERE monthly_income IS NOT NULL;

Using COALESCE and CASE WHEN safely

SELECT
  application_id,
  COALESCE(reported_bonus, 0) AS reported_bonus,
  CASE
    WHEN debt_to_income_ratio IS NULL THEN 'missing'
    WHEN debt_to_income_ratio > 0.4 THEN 'high'
    ELSE 'normal'
  END AS dti_bucket
FROM customer_applications;

Aggregate distortion

SELECT
  AVG(monthly_income) AS avg_excluding_nulls,
  AVG(COALESCE(monthly_income, 0)) AS avg_treating_missing_as_zero
FROM customer_applications;

Your answer

Try one AI text evaluation on us

Get structured feedback, scored against a 4-axis rubric. Premium unlocks unlimited.

0 wordstarget ~200

Up next

Handling Missing Values in SQLEasy

Handling Missing Data in SQLEasy AHandling Nulls in Analytics PrepEasy

Next question

Handling Missing Values in SQL

Easy

SQL & Data Manipulation

Asked at 1 company1ETLData WranglingQuality

Also asked at

Problem

Context

Core Question

Explain how you would handle a dataset with missing values using SQL. Your answer should cover:

How you identify missing data (NULLs, blanks, placeholder values)
When to filter rows out versus impute values
How functions like COALESCE and CASE WHEN help in analysis
Risks of handling missing values incorrectly in aggregates and reporting

Scope Guidance

Key Concepts

Profiling missing values

SELECT
  COUNT(*) AS total_rows,
  COUNT(*) FILTER (WHERE income IS NULL) AS null_income_rows,
  COUNT(*) FILTER (WHERE TRIM(employment_status) = '') AS blank_status_rows
FROM customer_applications;

Different missing-value types

Not all missing values are stored as NULL. Some datasets use empty strings, zeros, or codes like 'Unknown' or 'N/A', and those should often be standardized before analysis.

SELECT
  CASE
    WHEN employer_name IS NULL THEN 'NULL'
    WHEN TRIM(employer_name) = '' THEN 'blank'
    WHEN employer_name IN ('Unknown', 'N/A') THEN 'placeholder'
    ELSE 'valid'
  END AS employer_name_status,
  COUNT(*)
FROM customer_applications
GROUP BY employer_name_status;

Filtering vs imputation

SELECT AVG(monthly_income)
FROM customer_applications
WHERE monthly_income IS NOT NULL;

Using COALESCE and CASE WHEN safely

SELECT
  application_id,
  COALESCE(reported_bonus, 0) AS reported_bonus,
  CASE
    WHEN debt_to_income_ratio IS NULL THEN 'missing'
    WHEN debt_to_income_ratio > 0.4 THEN 'high'
    ELSE 'normal'
  END AS dti_bucket
FROM customer_applications;

Aggregate distortion

SELECT
  AVG(monthly_income) AS avg_excluding_nulls,
  AVG(COALESCE(monthly_income, 0)) AS avg_treating_missing_as_zero
FROM customer_applications;

Problem

Context

Core Question

Explain how you would handle a dataset with missing values using SQL. Your answer should cover:

How you identify missing data (NULLs, blanks, placeholder values)
When to filter rows out versus impute values
How functions like COALESCE and CASE WHEN help in analysis
Risks of handling missing values incorrectly in aggregates and reporting

Scope Guidance

Key Concepts

Profiling missing values

SELECT
  COUNT(*) AS total_rows,
  COUNT(*) FILTER (WHERE income IS NULL) AS null_income_rows,
  COUNT(*) FILTER (WHERE TRIM(employment_status) = '') AS blank_status_rows
FROM customer_applications;

Different missing-value types

Not all missing values are stored as NULL. Some datasets use empty strings, zeros, or codes like 'Unknown' or 'N/A', and those should often be standardized before analysis.

SELECT
  CASE
    WHEN employer_name IS NULL THEN 'NULL'
    WHEN TRIM(employer_name) = '' THEN 'blank'
    WHEN employer_name IN ('Unknown', 'N/A') THEN 'placeholder'
    ELSE 'valid'
  END AS employer_name_status,
  COUNT(*)
FROM customer_applications
GROUP BY employer_name_status;

Filtering vs imputation

SELECT AVG(monthly_income)
FROM customer_applications
WHERE monthly_income IS NOT NULL;

Using COALESCE and CASE WHEN safely

SELECT
  application_id,
  COALESCE(reported_bonus, 0) AS reported_bonus,
  CASE
    WHEN debt_to_income_ratio IS NULL THEN 'missing'
    WHEN debt_to_income_ratio > 0.4 THEN 'high'
    ELSE 'normal'
  END AS dti_bucket
FROM customer_applications;

Aggregate distortion

SELECT
  AVG(monthly_income) AS avg_excluding_nulls,
  AVG(COALESCE(monthly_income, 0)) AS avg_treating_missing_as_zero
FROM customer_applications;

Your answer

Try one AI text evaluation on us

Get structured feedback, scored against a 4-axis rubric. Premium unlocks unlimited.

0 wordstarget ~200

Up next

Handling Missing Values in SQLEasy

Handling Missing Data in SQLEasy AHandling Nulls in Analytics PrepEasy

Next question