


A
Analysts often need to answer business questions like revenue by customer, orders by region, or average spend by product category. These problems usually require combining tables correctly and then aggregating the data at the right level.
Explain how you would write a SELECT statement with JOINs and GROUP BY to answer a business question. In your answer, describe:
SELECT versus GROUP BYCOUNT(), SUM(), and AVG() fit into the queryThe interviewer expects a practical explanation, not just definitions. You should be able to describe a clear approach, discuss trade-offs such as INNER JOIN vs LEFT JOIN, and use a simple business example to show how the query is built.
Start with the table that represents the grain of the question. If the business question is about orders, the orders table is often the base because each row represents one business event being analyzed.
SELECT o.customer_id, COUNT(*) AS order_count
FROM orders o
GROUP BY o.customer_id;
Use INNER JOIN when you only want rows with matches in both tables. Use LEFT JOIN when you need to keep all rows from the base table, even if related data is missing.
SELECT c.customer_name, COUNT(o.order_id) AS order_count
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_name;
Every non-aggregated column in the SELECT list must appear in GROUP BY. The grouping columns define the level of the result, such as one row per customer, per month, or per product category.
SELECT p.category, SUM(oi.quantity * oi.unit_price) AS revenue
FROM order_items oi
JOIN products p ON oi.product_id = p.product_id
GROUP BY p.category;
Functions like COUNT(), SUM(), AVG(), MIN(), and MAX() summarize many rows into one value per group. They are what turn transactional data into business metrics.
SELECT region, AVG(order_total) AS avg_order_value
FROM orders
GROUP BY region;
Joining tables at different grains can multiply rows and inflate aggregates. For example, joining orders to order_items creates one row per item, so counting orders without care may overcount.
SELECT o.customer_id, COUNT(DISTINCT o.order_id) AS order_count
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
GROUP BY o.customer_id;