
Reporting datasets often fail because filtering rules are applied inconsistently across source rows, aggregated outputs, and downstream dashboards. Interviewers ask this to see whether you can separate business logic from SQL mechanics and build reliable, reusable datasets.
Explain how you would handle filtering logic when building a reporting dataset in SQL. Your answer should cover:
Keep the discussion practical. The interviewer is looking for a structured explanation of how to design filtering logic for reporting, not just a definition of WHERE and HAVING.
Use WHERE to remove rows before aggregation. This is the right place for conditions such as date ranges, status filters, null checks, or excluding test data because those rules define which source records belong in the dataset.
SELECT region, SUM(revenue) AS total_revenue
FROM sales
WHERE order_date >= DATE '2024-01-01'
AND status = 'completed'
GROUP BY region;
Use HAVING to filter grouped results after aggregation has been calculated. This is appropriate for rules like keeping only regions with more than 100 orders or customers with total spend above a threshold.
SELECT region, COUNT(*) AS order_count
FROM sales
WHERE status = 'completed'
GROUP BY region
HAVING COUNT(*) >= 100;
For reporting datasets, it is often better to derive boolean or categorical flags with CASE WHEN instead of hard-coding every rule in a final filter. This makes the dataset easier to audit and lets downstream users choose whether to include or exclude certain records.
SELECT
order_id,
CASE
WHEN status = 'completed' AND is_test = false THEN true
ELSE false
END AS is_reportable
FROM sales;
Filtering logic should be defined once and reused consistently. If one dashboard excludes refunds and another does not, metrics will not reconcile even if both queries are technically correct.
SELECT product_category, SUM(amount) AS net_sales
FROM transactions
WHERE include_in_reporting = true
GROUP BY product_category;
Reporting filters often break when nulls, empty strings, or unexpected status values are not handled explicitly. Good filtering logic defines what to do with incomplete or invalid records instead of relying on implicit SQL behavior.
SELECT COUNT(*) AS valid_rows
FROM customers
WHERE COALESCE(email, '') <> '';