AAt Discord scale, product analytics queries often run against very large event tables such as message sends, session starts, or notification deliveries. A query that works on a small dataset can time out once the table reaches billions of rows.
You are asked to explain how you would optimize a PostgreSQL query that is timing out on a billion-row table. Walk through how you would diagnose the bottleneck, what parts of the SQL you would inspect first, and how you would improve the query itself before proposing broader table or infrastructure changes. You should also explain how joins, filters, aggregations, and date logic can affect performance on large Discord event datasets.
The interviewer expects a practical, SQL-focused answer rather than a generic database lecture. You should cover query rewrites, indexing strategy, partitioning, and how you would validate that the optimized version is actually better.
The first goal is to reduce how many rows PostgreSQL has to read. Highly selective predicates on time ranges, event types, guilds, or surfaces should be applied as early as possible so later joins and aggregations operate on a much smaller working set.
SELECT user_id, COUNT(*)
FROM message_events
WHERE event_date >= DATE '2024-01-01'
AND event_date < DATE '2024-02-01'
AND surface = 'server_text_channel'
GROUP BY user_id;
Indexes help only when they align with the query's filtering, joining, or sorting pattern. For large analytics tables, composite indexes on common filter columns such as event_date and surface, or join keys such as user_id, can dramatically reduce scanned pages.
CREATE INDEX idx_message_events_date_surface_user
ON message_events (event_date, surface, user_id);
Repeated DISTINCTs, broad joins, and aggregations over the full event table are common causes of timeouts. It is usually faster to pre-filter or pre-aggregate the large fact table in a CTE or subquery before joining to smaller dimension tables.
WITH filtered_events AS (
SELECT user_id, guild_id
FROM message_events
WHERE event_date >= CURRENT_DATE - INTERVAL '7 days'
)
SELECT g.guild_name, COUNT(*)
FROM filtered_events fe
JOIN guilds g ON fe.guild_id = g.guild_id
GROUP BY g.guild_name;
Two queries that look similar can perform very differently. EXPLAIN or EXPLAIN ANALYZE shows whether PostgreSQL is doing a sequential scan, using an index, spilling a sort to disk, or producing a huge intermediate join result.
EXPLAIN ANALYZE
SELECT user_id, COUNT(*)
FROM message_events
WHERE event_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY user_id;
When the base table is extremely large, query rewrites alone may not be enough. Range partitioning by date and precomputed daily aggregates can reduce scan volume substantially for recurring product analytics questions.
SELECT event_date, guild_id, messages_sent
FROM daily_guild_message_summary
WHERE event_date >= DATE '2024-01-01'
AND event_date < DATE '2024-02-01';