Interview Guides

Partition Meta Events Table | Dataford Interview Questions - Dataford - Ace your Interview

Partition Meta Events Table

Hard

SQL & Data Manipulation

Asked at 1 company1

Also asked at

Problem

Context

At Meta scale, a 10TB event log backing surfaces like Facebook Feed or Instagram Reels can become slow to query and expensive to maintain if it remains a single heap table. Interviewers want to see whether you can design a partitioning strategy that improves scan efficiency without making ingestion and maintenance fragile.

Core question

Explain how you would partition a 10TB events table in PostgreSQL for performance. Your answer should cover:

Which partition key you would choose and why
Whether you would use range, list, or hash partitioning
How query patterns affect the design
How indexes should be defined on partitions
How you would handle retention, backfills, and skewed tenants or event types
What trade-offs or failure modes you would watch for

Scope guidance

Keep the answer practical and PostgreSQL-specific. The interviewer is not looking for generic distributed data lake advice; they want a database design discussion grounded in partition pruning, index strategy, maintenance operations, and operational trade-offs.

Key Concepts

Partition key selection

The partition key should align with the most common and most selective query predicates. For large append-only event tables, event_date or event_time is usually the best primary partition key because most analytics and retention operations are time-bounded.

CREATE TABLE events (
  event_id BIGINT,
  user_id BIGINT,
  event_time TIMESTAMP NOT NULL,
  event_date DATE GENERATED ALWAYS AS (event_time::date) STORED,
  event_name TEXT,
  surface TEXT,
  payload JSONB
) PARTITION BY RANGE (event_date);

Range partitioning for time-series data

Range partitioning is usually the right fit for event tables because data arrives over time and queries commonly filter by recent days, weeks, or months. It also makes retention simple because old partitions can be detached or dropped without rewriting the full table.

CREATE TABLE events_2025_01 PARTITION OF events
FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');

Partition pruning

Partition pruning lets PostgreSQL skip partitions that cannot match a query predicate. This is the main performance benefit: if a query asks for the last 7 days, the planner can avoid scanning years of historical partitions.

SELECT COUNT(*)
FROM events
WHERE event_date >= CURRENT_DATE - INTERVAL '7 days'
  AND surface = 'facebook_feed';

Local indexing on partitions

Indexes are created on the partitioned table definition, but PostgreSQL maintains physical indexes on each partition. You should keep indexes minimal and aligned with actual filters, such as (event_date, surface) or (user_id, event_time) depending on access patterns.

CREATE INDEX ON events (surface, event_time);
CREATE INDEX ON events (user_id, event_time);

Operational trade-offs

Too few partitions reduce pruning benefits, while too many partitions increase planning time, catalog overhead, and maintenance complexity. A strong answer balances query performance with ingestion throughput, partition management automation, and retention workflows.

Problem

Context

Core question

Explain how you would partition a 10TB events table in PostgreSQL for performance. Your answer should cover:

Which partition key you would choose and why
Whether you would use range, list, or hash partitioning
How query patterns affect the design
How indexes should be defined on partitions
How you would handle retention, backfills, and skewed tenants or event types
What trade-offs or failure modes you would watch for

Scope guidance

Key Concepts

Partition key selection

CREATE TABLE events (
  event_id BIGINT,
  user_id BIGINT,
  event_time TIMESTAMP NOT NULL,
  event_date DATE GENERATED ALWAYS AS (event_time::date) STORED,
  event_name TEXT,
  surface TEXT,
  payload JSONB
) PARTITION BY RANGE (event_date);

Range partitioning for time-series data

CREATE TABLE events_2025_01 PARTITION OF events
FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');

Partition pruning

SELECT COUNT(*)
FROM events
WHERE event_date >= CURRENT_DATE - INTERVAL '7 days'
  AND surface = 'facebook_feed';

Local indexing on partitions

CREATE INDEX ON events (surface, event_time);
CREATE INDEX ON events (user_id, event_time);

Operational trade-offs

Your answer

Try one AI text evaluation on us

Get structured feedback, scored against a 4-axis rubric. Premium unlocks unlimited.

0 wordstarget ~200

Up next

Partitioning vs Bucketing at ScaleMedium AOptimizing Discord Event QueriesMedium

Design Experiment Assignment SchemaEasy

Next question

Partition Meta Events Table

Hard

SQL & Data Manipulation

Asked at 1 company1

Also asked at

Problem

Context

Core question

Explain how you would partition a 10TB events table in PostgreSQL for performance. Your answer should cover:

Which partition key you would choose and why
Whether you would use range, list, or hash partitioning
How query patterns affect the design
How indexes should be defined on partitions
How you would handle retention, backfills, and skewed tenants or event types
What trade-offs or failure modes you would watch for

Scope guidance

Key Concepts

Partition key selection

CREATE TABLE events (
  event_id BIGINT,
  user_id BIGINT,
  event_time TIMESTAMP NOT NULL,
  event_date DATE GENERATED ALWAYS AS (event_time::date) STORED,
  event_name TEXT,
  surface TEXT,
  payload JSONB
) PARTITION BY RANGE (event_date);

Range partitioning for time-series data

CREATE TABLE events_2025_01 PARTITION OF events
FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');

Partition pruning

SELECT COUNT(*)
FROM events
WHERE event_date >= CURRENT_DATE - INTERVAL '7 days'
  AND surface = 'facebook_feed';

Local indexing on partitions

CREATE INDEX ON events (surface, event_time);
CREATE INDEX ON events (user_id, event_time);

Operational trade-offs

Problem

Context

Core question

Explain how you would partition a 10TB events table in PostgreSQL for performance. Your answer should cover:

Which partition key you would choose and why
Whether you would use range, list, or hash partitioning
How query patterns affect the design
How indexes should be defined on partitions
How you would handle retention, backfills, and skewed tenants or event types
What trade-offs or failure modes you would watch for

Scope guidance

Key Concepts

Partition key selection

CREATE TABLE events (
  event_id BIGINT,
  user_id BIGINT,
  event_time TIMESTAMP NOT NULL,
  event_date DATE GENERATED ALWAYS AS (event_time::date) STORED,
  event_name TEXT,
  surface TEXT,
  payload JSONB
) PARTITION BY RANGE (event_date);

Range partitioning for time-series data

CREATE TABLE events_2025_01 PARTITION OF events
FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');

Partition pruning

SELECT COUNT(*)
FROM events
WHERE event_date >= CURRENT_DATE - INTERVAL '7 days'
  AND surface = 'facebook_feed';

Local indexing on partitions

CREATE INDEX ON events (surface, event_time);
CREATE INDEX ON events (user_id, event_time);

Operational trade-offs

Your answer

Try one AI text evaluation on us

Get structured feedback, scored against a 4-axis rubric. Premium unlocks unlimited.

0 wordstarget ~200

Up next

Partitioning vs Bucketing at ScaleMedium AOptimizing Discord Event QueriesMedium

Design Experiment Assignment SchemaEasy

Next question