




At companies like FleetOps, operational records often come from multiple systems such as warehouse apps, delivery tools, and support platforms. Analysts need a consistent way to combine these datasets before reporting or downstream processing.
Explain how you would use UNION and UNION ALL to combine operational data from multiple sources. In your answer, cover:
UNION and UNION ALLThe interviewer expects a practical explanation, not just syntax. You should describe how to standardize source datasets, choose the right set operator, and prepare the combined result for reporting or aggregation in PostgreSQL.
UNION removes duplicate rows across result sets, while UNION ALL keeps every row. In operational pipelines, UNION ALL is often preferred first because it preserves raw records and avoids hiding duplicate data issues.
SELECT order_id, event_date, amount
FROM web_orders
UNION ALL
SELECT order_id, event_date, amount
FROM store_orders;
Each SELECT in a union must return the same number of columns in the same order, and the columns should represent the same business meaning. If source schemas differ, you standardize them with aliases, casts, constants, or NULL placeholders.
SELECT order_id, created_at::date AS event_date, total_amount AS amount, 'web' AS source
FROM web_orders
UNION ALL
SELECT sale_id AS order_id, sale_date AS event_date, amount, 'store' AS source
FROM store_sales;
Removing duplicates too early can hide legitimate repeated events, especially when two systems capture similar but not identical records. A safer approach is often to union all data first, retain a source column, and deduplicate later using explicit business rules.
SELECT order_id, event_date, amount, source
FROM (
SELECT order_id, event_date, amount, 'web' AS source FROM web_orders
UNION ALL
SELECT order_id, event_date, amount, 'store' AS source FROM store_orders
) combined;
After combining source data, you can aggregate over the unified result to produce totals, counts, or daily metrics. This is useful when multiple operational systems feed the same reporting layer.
SELECT event_date, SUM(amount) AS total_amount
FROM (
SELECT created_at::date AS event_date, total_amount AS amount FROM web_orders
UNION ALL
SELECT sale_date AS event_date, amount FROM store_sales
) combined
GROUP BY event_date
ORDER BY event_date;