Interview Guides

Debug and Optimize Slow PostgreSQL Queries | Dataford Interview Questions - Dataford - Ace your Interview

Debug and Optimize Slow PostgreSQL Queries

Hard

SQL & Data Manipulation

Asked at 1 company1JoinsAggregationsWindow FunctionsData Wrangling

Also asked at

Problem

Context

Slow SQL can block dashboards, delay batch jobs, and create cascading issues in downstream systems. For an engineering manager, the interviewer wants to hear both technical depth and a structured debugging process.

Question

You are told that a PostgreSQL query powering an Apple Analytics dashboard has grown from minutes to two hours. Walk through how you would debug and optimize it. Explain how you would isolate whether the problem comes from joins, aggregations, filters, window functions, subqueries, data growth, or missing indexes, and how you would validate that an optimization actually improved the query without changing the result.

Scope guidance

Answer at the level of a senior engineer or manager: start with how you would inspect the query and execution plan, then discuss likely bottlenecks, concrete optimization techniques in PostgreSQL, and the trade-offs or risks of each change.

Key Concepts

EXPLAIN and EXPLAIN ANALYZE

In PostgreSQL, EXPLAIN shows the optimizer's planned execution path, while EXPLAIN ANALYZE runs the query and reports actual timing, row counts, and loop counts. The gap between estimated and actual rows is often the fastest way to find bad cardinality estimates, stale statistics, or inefficient joins.

EXPLAIN ANALYZE
SELECT *
FROM orders o
JOIN customers c ON c.customer_id = o.customer_id
WHERE o.order_date >= DATE '2024-01-01';

Join strategy and cardinality

Slow queries often come from joining large tables too early, using the wrong join type, or suffering from row explosion in one-to-many relationships. Understanding whether PostgreSQL chose a nested loop, hash join, or merge join helps identify whether the problem is indexing, memory, or data volume.

SELECT s.session_id, e.event_name
FROM sessions s
JOIN events e ON e.session_id = s.session_id;

Index design for filters and joins

Indexes help when predicates are selective and align with join keys, filter columns, and sort columns. A missing or poorly ordered composite index can force sequential scans, expensive sorts, or repeated lookups on large tables.

CREATE INDEX idx_events_session_date
ON events (session_id, event_timestamp);

Query rewrites

A query can often be improved by pushing filters earlier, pre-aggregating before joins, replacing correlated subqueries, or removing unnecessary DISTINCT and repeated window calculations. Rewriting is especially useful when the SQL shape causes far more intermediate rows than the final output needs.

WITH filtered_orders AS (
  SELECT order_id, customer_id, order_total
  FROM orders
  WHERE order_date >= DATE '2024-01-01'
)
SELECT customer_id, SUM(order_total)
FROM filtered_orders
GROUP BY customer_id;

Validation and regression safety

Optimization is not complete until you prove the result set is still correct and the improvement is stable under production-like data volumes. Good answers include comparing row counts and aggregates before and after, checking plan changes, and monitoring whether the query remains fast as data grows.

EXPLAIN (ANALYZE, BUFFERS)
SELECT customer_id, SUM(order_total)
FROM orders
GROUP BY customer_id;

Problem

Context

Question

Scope guidance

Key Concepts

EXPLAIN and EXPLAIN ANALYZE

EXPLAIN ANALYZE
SELECT *
FROM orders o
JOIN customers c ON c.customer_id = o.customer_id
WHERE o.order_date >= DATE '2024-01-01';

Join strategy and cardinality

SELECT s.session_id, e.event_name
FROM sessions s
JOIN events e ON e.session_id = s.session_id;

Index design for filters and joins

CREATE INDEX idx_events_session_date
ON events (session_id, event_timestamp);

Query rewrites

WITH filtered_orders AS (
  SELECT order_id, customer_id, order_total
  FROM orders
  WHERE order_date >= DATE '2024-01-01'
)
SELECT customer_id, SUM(order_total)
FROM filtered_orders
GROUP BY customer_id;

Validation and regression safety

EXPLAIN (ANALYZE, BUFFERS)
SELECT customer_id, SUM(order_total)
FROM orders
GROUP BY customer_id;

Your answer

Try one AI text evaluation on us

Get structured feedback, scored against a 4-axis rubric. Premium unlocks unlimited.

0 wordstarget ~200

Up next

Optimize Slow PostgreSQL Reporting QueriesEasy

Optimizing Slow PostgreSQL QueriesMedium

Optimizing Slow Sales QueriesEasy

Next question

Debug and Optimize Slow PostgreSQL Queries

Hard

SQL & Data Manipulation

Asked at 1 company1JoinsAggregationsWindow FunctionsData Wrangling

Also asked at

Problem

Context

Question

Scope guidance

Key Concepts

EXPLAIN and EXPLAIN ANALYZE

EXPLAIN ANALYZE
SELECT *
FROM orders o
JOIN customers c ON c.customer_id = o.customer_id
WHERE o.order_date >= DATE '2024-01-01';

Join strategy and cardinality

SELECT s.session_id, e.event_name
FROM sessions s
JOIN events e ON e.session_id = s.session_id;

Index design for filters and joins

CREATE INDEX idx_events_session_date
ON events (session_id, event_timestamp);

Query rewrites

WITH filtered_orders AS (
  SELECT order_id, customer_id, order_total
  FROM orders
  WHERE order_date >= DATE '2024-01-01'
)
SELECT customer_id, SUM(order_total)
FROM filtered_orders
GROUP BY customer_id;

Validation and regression safety

EXPLAIN (ANALYZE, BUFFERS)
SELECT customer_id, SUM(order_total)
FROM orders
GROUP BY customer_id;

Problem

Context

Question

Scope guidance

Key Concepts

EXPLAIN and EXPLAIN ANALYZE

EXPLAIN ANALYZE
SELECT *
FROM orders o
JOIN customers c ON c.customer_id = o.customer_id
WHERE o.order_date >= DATE '2024-01-01';

Join strategy and cardinality

SELECT s.session_id, e.event_name
FROM sessions s
JOIN events e ON e.session_id = s.session_id;

Index design for filters and joins

CREATE INDEX idx_events_session_date
ON events (session_id, event_timestamp);

Query rewrites

WITH filtered_orders AS (
  SELECT order_id, customer_id, order_total
  FROM orders
  WHERE order_date >= DATE '2024-01-01'
)
SELECT customer_id, SUM(order_total)
FROM filtered_orders
GROUP BY customer_id;

Validation and regression safety

EXPLAIN (ANALYZE, BUFFERS)
SELECT customer_id, SUM(order_total)
FROM orders
GROUP BY customer_id;

Your answer

Try one AI text evaluation on us

Get structured feedback, scored against a 4-axis rubric. Premium unlocks unlimited.

0 wordstarget ~200

Up next

Optimize Slow PostgreSQL Reporting QueriesEasy

Optimizing Slow PostgreSQL QueriesMedium

Optimizing Slow Sales QueriesEasy

Next question