


At companies like Stripe or Airbnb, analysts often need to answer questions that depend on intermediate results, such as finding customers above average spend or products with the highest sales. Subqueries are one of the core SQL tools for solving these problems.
Explain what a subquery is in SQL and how it works. In your answer, cover:
Keep the explanation at an interview-ready level. The interviewer is not looking for database internals, but they do expect clear definitions, practical examples, and an understanding of when subqueries are the right tool versus when another approach may be better.
A subquery is a query nested inside another SQL statement. It produces a result that the outer query uses for filtering, comparison, or deriving values.
SELECT customer_id, total_spent
FROM customer_spend
WHERE total_spent > (
SELECT AVG(total_spent)
FROM customer_spend
);
A scalar subquery returns exactly one value and is often used with operators like = or >. A multi-row subquery returns multiple values and is commonly used with IN, ANY, or ALL.
SELECT product_name
FROM products
WHERE category_id IN (
SELECT category_id
FROM featured_categories
);
A correlated subquery references columns from the outer query, so it is evaluated in relation to each outer row. This is useful for row-by-row comparisons, such as checking whether a value is above a group-specific average.
SELECT e.employee_id, e.department_id, e.salary
FROM employees e
WHERE e.salary > (
SELECT AVG(e2.salary)
FROM employees e2
WHERE e2.department_id = e.department_id
);
Subqueries are often easier to read for filtering based on derived sets or single summary values. Joins can be better when you need columns from multiple tables or want the optimizer to work with a flatter query shape.
SELECT customer_id
FROM orders
WHERE customer_id IN (
SELECT customer_id
FROM vip_customers
);
Subqueries are logically valid but can become inefficient or misleading if used carelessly. Common issues include using = with a subquery that returns multiple rows, or writing correlated subqueries when a grouped join would be clearer and faster.
SELECT order_id
FROM orders
WHERE amount = (
SELECT MAX(amount)
FROM orders
);