Interview Guides

Design Experiment Assignment Schema | Dataford Interview Questions - Dataford - Ace your Interview

Design Experiment Assignment Schema

Easy

SQL & Data Manipulation

Asked at 1 company1

Also asked at

Problem

Context

At Meta, teams running experiments on surfaces like Facebook Feed or Instagram Reels need schemas that support both reliable analysis and efficient SQL querying. A weak design makes it hard to answer basic questions like who was assigned, when they were exposed, and whether they converted.

Core question

Explain how you would design a PostgreSQL schema to track A/B test assignments and outcomes. Your answer should cover:

The core tables and keys needed to represent experiments, variants, assignments, exposures, and outcome events
How to handle assignment grain, late-arriving events, and users with multiple events
How analysts would join the tables to compute experiment metrics such as conversion rate and guardrails
What constraints or modeling choices help prevent common data quality issues

Scope guidance

The interviewer expects a practical schema design discussion grounded in SQL analytics. Focus on table structure, join paths, event timestamps, deduplication strategy, and trade-offs between normalized event tables and pre-aggregated outcome tables.

Key Concepts

Assignment Table as the Source of Truth

The assignment table should record the unit of randomization, experiment, variant, and assignment timestamp. This table is the denominator for most experiment analyses because it defines who was eligible and assigned, even if they never generated downstream events.

SELECT experiment_id, variant_id, COUNT(*) AS assigned_users
FROM experiment_assignments
GROUP BY experiment_id, variant_id;

Separate Exposure from Assignment

Assignment does not always mean the user actually saw the treatment. A separate exposure table lets analysts distinguish intent-to-treat analysis from exposed-user analysis and helps diagnose delivery issues.

SELECT a.experiment_id, a.variant_id,
       COUNT(DISTINCT a.user_id) AS assigned_users,
       COUNT(DISTINCT e.user_id) AS exposed_users
FROM experiment_assignments a
LEFT JOIN experiment_exposures e
  ON a.experiment_id = e.experiment_id
 AND a.user_id = e.user_id
GROUP BY a.experiment_id, a.variant_id;

Outcome Events at Event Grain

Store outcomes such as click, purchase, or session length in an event table at the lowest useful grain. This preserves flexibility because analysts can later define different metrics, attribution windows, and guardrails without changing raw data.

SELECT user_id, event_name, event_ts, metric_value
FROM experiment_outcomes
WHERE event_name IN ('signup', 'purchase');

Time Windows and Attribution Logic

Experiment analysis depends on whether an outcome happened after assignment or after first exposure and within a defined window. The schema must include timestamps that make these filters explicit and reproducible in SQL.

WITH first_exposure AS (
  SELECT experiment_id, user_id, MIN(exposure_ts) AS first_exposure_ts
  FROM experiment_exposures
  GROUP BY experiment_id, user_id
)
SELECT o.user_id, o.event_name
FROM experiment_outcomes o
JOIN first_exposure f
  ON o.experiment_id = f.experiment_id
 AND o.user_id = f.user_id
WHERE o.event_ts >= f.first_exposure_ts
  AND o.event_ts < f.first_exposure_ts + INTERVAL '7 days';

Data Quality Constraints and Deduplication

Good schema design uses primary keys, foreign keys, uniqueness constraints, and event identifiers to prevent double counting and invalid joins. Without these controls, experiment metrics can be wrong even if the SQL query looks correct.

ALTER TABLE experiment_assignments
ADD CONSTRAINT uq_assignment UNIQUE (experiment_id, user_id);

ALTER TABLE experiment_outcomes
ADD CONSTRAINT uq_outcome_event UNIQUE (event_id);

Problem

Context

Core question

Explain how you would design a PostgreSQL schema to track A/B test assignments and outcomes. Your answer should cover:

The core tables and keys needed to represent experiments, variants, assignments, exposures, and outcome events
How to handle assignment grain, late-arriving events, and users with multiple events
How analysts would join the tables to compute experiment metrics such as conversion rate and guardrails
What constraints or modeling choices help prevent common data quality issues

Scope guidance

Key Concepts

Assignment Table as the Source of Truth

SELECT experiment_id, variant_id, COUNT(*) AS assigned_users
FROM experiment_assignments
GROUP BY experiment_id, variant_id;

Separate Exposure from Assignment

SELECT a.experiment_id, a.variant_id,
       COUNT(DISTINCT a.user_id) AS assigned_users,
       COUNT(DISTINCT e.user_id) AS exposed_users
FROM experiment_assignments a
LEFT JOIN experiment_exposures e
  ON a.experiment_id = e.experiment_id
 AND a.user_id = e.user_id
GROUP BY a.experiment_id, a.variant_id;

Outcome Events at Event Grain

SELECT user_id, event_name, event_ts, metric_value
FROM experiment_outcomes
WHERE event_name IN ('signup', 'purchase');

Time Windows and Attribution Logic

WITH first_exposure AS (
  SELECT experiment_id, user_id, MIN(exposure_ts) AS first_exposure_ts
  FROM experiment_exposures
  GROUP BY experiment_id, user_id
)
SELECT o.user_id, o.event_name
FROM experiment_outcomes o
JOIN first_exposure f
  ON o.experiment_id = f.experiment_id
 AND o.user_id = f.user_id
WHERE o.event_ts >= f.first_exposure_ts
  AND o.event_ts < f.first_exposure_ts + INTERVAL '7 days';

Data Quality Constraints and Deduplication

ALTER TABLE experiment_assignments
ADD CONSTRAINT uq_assignment UNIQUE (experiment_id, user_id);

ALTER TABLE experiment_outcomes
ADD CONSTRAINT uq_outcome_event UNIQUE (event_id);

Your answer

Try one AI text evaluation on us

Get structured feedback, scored against a 4-axis rubric. Premium unlocks unlimited.

0 wordstarget ~200

Up next

Partition Meta Events TableHard

Schema Design for Analytics vs OLTPMedium Working with PostgreSQL and CockroachDBEasy

Next question

Design Experiment Assignment Schema

Easy

SQL & Data Manipulation

Asked at 1 company1

Also asked at

Problem

Context

Core question

Explain how you would design a PostgreSQL schema to track A/B test assignments and outcomes. Your answer should cover:

The core tables and keys needed to represent experiments, variants, assignments, exposures, and outcome events
How to handle assignment grain, late-arriving events, and users with multiple events
How analysts would join the tables to compute experiment metrics such as conversion rate and guardrails
What constraints or modeling choices help prevent common data quality issues

Scope guidance

Key Concepts

Assignment Table as the Source of Truth

SELECT experiment_id, variant_id, COUNT(*) AS assigned_users
FROM experiment_assignments
GROUP BY experiment_id, variant_id;

Separate Exposure from Assignment

SELECT a.experiment_id, a.variant_id,
       COUNT(DISTINCT a.user_id) AS assigned_users,
       COUNT(DISTINCT e.user_id) AS exposed_users
FROM experiment_assignments a
LEFT JOIN experiment_exposures e
  ON a.experiment_id = e.experiment_id
 AND a.user_id = e.user_id
GROUP BY a.experiment_id, a.variant_id;

Outcome Events at Event Grain

SELECT user_id, event_name, event_ts, metric_value
FROM experiment_outcomes
WHERE event_name IN ('signup', 'purchase');

Time Windows and Attribution Logic

WITH first_exposure AS (
  SELECT experiment_id, user_id, MIN(exposure_ts) AS first_exposure_ts
  FROM experiment_exposures
  GROUP BY experiment_id, user_id
)
SELECT o.user_id, o.event_name
FROM experiment_outcomes o
JOIN first_exposure f
  ON o.experiment_id = f.experiment_id
 AND o.user_id = f.user_id
WHERE o.event_ts >= f.first_exposure_ts
  AND o.event_ts < f.first_exposure_ts + INTERVAL '7 days';

Data Quality Constraints and Deduplication

ALTER TABLE experiment_assignments
ADD CONSTRAINT uq_assignment UNIQUE (experiment_id, user_id);

ALTER TABLE experiment_outcomes
ADD CONSTRAINT uq_outcome_event UNIQUE (event_id);

Problem

Context

Core question

Explain how you would design a PostgreSQL schema to track A/B test assignments and outcomes. Your answer should cover:

The core tables and keys needed to represent experiments, variants, assignments, exposures, and outcome events
How to handle assignment grain, late-arriving events, and users with multiple events
How analysts would join the tables to compute experiment metrics such as conversion rate and guardrails
What constraints or modeling choices help prevent common data quality issues

Scope guidance

Key Concepts

Assignment Table as the Source of Truth

SELECT experiment_id, variant_id, COUNT(*) AS assigned_users
FROM experiment_assignments
GROUP BY experiment_id, variant_id;

Separate Exposure from Assignment

SELECT a.experiment_id, a.variant_id,
       COUNT(DISTINCT a.user_id) AS assigned_users,
       COUNT(DISTINCT e.user_id) AS exposed_users
FROM experiment_assignments a
LEFT JOIN experiment_exposures e
  ON a.experiment_id = e.experiment_id
 AND a.user_id = e.user_id
GROUP BY a.experiment_id, a.variant_id;

Outcome Events at Event Grain

SELECT user_id, event_name, event_ts, metric_value
FROM experiment_outcomes
WHERE event_name IN ('signup', 'purchase');

Time Windows and Attribution Logic

WITH first_exposure AS (
  SELECT experiment_id, user_id, MIN(exposure_ts) AS first_exposure_ts
  FROM experiment_exposures
  GROUP BY experiment_id, user_id
)
SELECT o.user_id, o.event_name
FROM experiment_outcomes o
JOIN first_exposure f
  ON o.experiment_id = f.experiment_id
 AND o.user_id = f.user_id
WHERE o.event_ts >= f.first_exposure_ts
  AND o.event_ts < f.first_exposure_ts + INTERVAL '7 days';

Data Quality Constraints and Deduplication

ALTER TABLE experiment_assignments
ADD CONSTRAINT uq_assignment UNIQUE (experiment_id, user_id);

ALTER TABLE experiment_outcomes
ADD CONSTRAINT uq_outcome_event UNIQUE (event_id);

Your answer

Try one AI text evaluation on us

Get structured feedback, scored against a 4-axis rubric. Premium unlocks unlimited.

0 wordstarget ~200

Up next

Partition Meta Events TableHard

Schema Design for Analytics vs OLTPMedium Working with PostgreSQL and CockroachDBEasy

Next question