
AA

Data teams at companies like Stripe use SQL to verify that loaded or transformed data matches expectations before dashboards, reports, or downstream models rely on it. Interviewers ask this to assess whether you can use SQL as a practical testing tool, not just for analysis.
Explain how you would use SQL to verify data during testing. In your answer, cover:
Keep the answer focused on common SQL-based validation patterns used in analytics engineering, ETL testing, or QA. The interviewer is looking for a structured approach, clear examples, and awareness of common failure modes rather than advanced database internals.
A basic validation step is confirming that the number of rows loaded or transformed matches expectations. This helps detect missing records, accidental filtering, duplicate ingestion, or partial pipeline failures.
SELECT COUNT(*) AS row_count
FROM orders;
SQL is commonly used to check whether required fields contain NULL values or unexpected blanks. This is important because missing keys, dates, or amounts often break downstream joins, aggregations, or reporting logic.
SELECT COUNT(*) AS null_customer_ids
FROM orders
WHERE customer_id IS NULL;
Primary business identifiers such as order_id or user_id should often be unique at a defined grain. SQL can quickly surface duplicate records that indicate bad merges, repeated loads, or incorrect transformations.
SELECT order_id, COUNT(*) AS duplicate_count
FROM orders
GROUP BY order_id
HAVING COUNT(*) > 1;
Beyond structural checks, SQL can verify domain rules such as non-negative amounts, valid statuses, or date relationships. These checks ensure the data is not only present, but logically correct for the business process.
SELECT COUNT(*) AS invalid_rows
FROM orders
WHERE amount < 0
OR order_date > shipped_date;
Testing often involves comparing actual output against an expected result set or known benchmark. SQL supports this through filtered checks, grouped summaries, and set-based comparisons to isolate where mismatches occur.
SELECT status, COUNT(*) AS actual_count
FROM orders
GROUP BY status;