
At Meta scale, a 10TB event log backing surfaces like Facebook Feed or Instagram Reels can become slow to query and expensive to maintain if it remains a single heap table. Interviewers want to see whether you can design a partitioning strategy that improves scan efficiency without making ingestion and maintenance fragile.
Explain how you would partition a 10TB events table in PostgreSQL for performance. Your answer should cover:
Keep the answer practical and PostgreSQL-specific. The interviewer is not looking for generic distributed data lake advice; they want a database design discussion grounded in partition pruning, index strategy, maintenance operations, and operational trade-offs.
The partition key should align with the most common and most selective query predicates. For large append-only event tables, event_date or event_time is usually the best primary partition key because most analytics and retention operations are time-bounded.
CREATE TABLE events (
event_id BIGINT,
user_id BIGINT,
event_time TIMESTAMP NOT NULL,
event_date DATE GENERATED ALWAYS AS (event_time::date) STORED,
event_name TEXT,
surface TEXT,
payload JSONB
) PARTITION BY RANGE (event_date);
Range partitioning is usually the right fit for event tables because data arrives over time and queries commonly filter by recent days, weeks, or months. It also makes retention simple because old partitions can be detached or dropped without rewriting the full table.
CREATE TABLE events_2025_01 PARTITION OF events
FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');
Partition pruning lets PostgreSQL skip partitions that cannot match a query predicate. This is the main performance benefit: if a query asks for the last 7 days, the planner can avoid scanning years of historical partitions.
SELECT COUNT(*)
FROM events
WHERE event_date >= CURRENT_DATE - INTERVAL '7 days'
AND surface = 'facebook_feed';
Indexes are created on the partitioned table definition, but PostgreSQL maintains physical indexes on each partition. You should keep indexes minimal and aligned with actual filters, such as (event_date, surface) or (user_id, event_time) depending on access patterns.
CREATE INDEX ON events (surface, event_time);
CREATE INDEX ON events (user_id, event_time);
Too few partitions reduce pruning benefits, while too many partitions increase planning time, catalog overhead, and maintenance complexity. A strong answer balances query performance with ingestion throughput, partition management automation, and retention workflows.