
A


At companies like Northstar Analytics, SQL results are often shared directly with stakeholders, so data quality issues can quickly undermine trust. Interviewers ask this question to understand whether you validate your outputs instead of only writing queries that run.
Explain how you ensure the accuracy and integrity of your data before presenting findings. Your answer should cover how you check for missing values, duplicates, invalid records, and aggregation mismatches. Also describe how you confirm that your final numbers align with business definitions and source data.
Keep the discussion practical and SQL-focused. The interviewer is not looking for a full data governance framework; they want a clear workflow you would follow when validating query results, examples of checks you would run, and how you would communicate assumptions or limitations before presenting insights.
Before reporting, verify that required fields are populated and that key columns do not contain unexpected NULLs. Missing values in IDs, dates, or metrics can distort counts, averages, and downstream business conclusions.
SELECT COUNT(*) AS null_customer_ids
FROM orders
WHERE customer_id IS NULL;
Check whether records that should be unique, such as transaction IDs or event IDs, appear more than once. Duplicate rows can inflate totals and create misleading trends if not identified early.
SELECT order_id, COUNT(*) AS row_count
FROM orders
GROUP BY order_id
HAVING COUNT(*) > 1;
Validate that values fall within expected business rules, such as non-negative revenue, valid statuses, or realistic dates. These checks catch bad source data and logic errors in transformations.
SELECT *
FROM orders
WHERE amount < 0
OR order_date > CURRENT_DATE;
Compare your final aggregates to trusted source counts or prior known benchmarks. Reconciliation helps confirm that filters, grouping logic, and metric definitions are producing expected totals.
SELECT order_date, COUNT(*) AS order_count, SUM(amount) AS total_revenue
FROM orders
GROUP BY order_date
ORDER BY order_date;
A query can be technically correct but still wrong for the business if the metric definition is misunderstood. Always confirm how the business defines terms like active user, completed order, or net revenue before presenting results.