

A
Database validation is a common part of analytics engineering, ETL testing, and production data quality checks. Interviewers ask this to understand whether you can use SQL to confirm that data is complete, accurate, and consistent.
Describe your experience using SQL for database validation. In your answer, explain how you would use SQL to:
Keep the discussion practical rather than theoretical. The interviewer expects a structured explanation of the most common validation patterns, the SQL queries you would write, what issues they catch, and how you would prioritize checks in a real workflow.
Row count checks confirm whether the expected number of records exists after an ingestion or transformation step. They are often the first validation because they quickly reveal missing loads, duplicate loads, or unexpected filtering.
SELECT COUNT(*) AS row_count
FROM orders;
SQL is commonly used to test whether required columns contain NULLs or invalid values outside allowed ranges. These checks help catch schema issues, parsing failures, and bad upstream data before it affects reporting.
SELECT COUNT(*) AS invalid_rows
FROM orders
WHERE customer_id IS NULL
OR order_total < 0;
Validation often includes checking whether a supposed key is actually unique. Grouping by the key and filtering to counts greater than one is a simple way to identify duplicate records that can distort metrics.
SELECT order_id, COUNT(*) AS duplicate_count
FROM orders
GROUP BY order_id
HAVING COUNT(*) > 1;
Aggregate validation compares totals such as row counts, sums, or averages between source and target tables. This is useful after ETL jobs to confirm that transformations did not lose or incorrectly alter data.
SELECT
COUNT(*) AS row_count,
SUM(order_total) AS total_revenue
FROM orders;
Some checks go beyond technical correctness and verify that the data follows business logic. Examples include ensuring shipped orders have a ship date or that refunded amounts do not exceed original payments.
SELECT COUNT(*) AS rule_violations
FROM orders
WHERE order_status = 'shipped'
AND shipped_at IS NULL;