
At Meta, teams running experiments on surfaces like Facebook Feed or Instagram Reels need schemas that support both reliable analysis and efficient SQL querying. A weak design makes it hard to answer basic questions like who was assigned, when they were exposed, and whether they converted.
Explain how you would design a PostgreSQL schema to track A/B test assignments and outcomes. Your answer should cover:
The interviewer expects a practical schema design discussion grounded in SQL analytics. Focus on table structure, join paths, event timestamps, deduplication strategy, and trade-offs between normalized event tables and pre-aggregated outcome tables.
The assignment table should record the unit of randomization, experiment, variant, and assignment timestamp. This table is the denominator for most experiment analyses because it defines who was eligible and assigned, even if they never generated downstream events.
SELECT experiment_id, variant_id, COUNT(*) AS assigned_users
FROM experiment_assignments
GROUP BY experiment_id, variant_id;
Assignment does not always mean the user actually saw the treatment. A separate exposure table lets analysts distinguish intent-to-treat analysis from exposed-user analysis and helps diagnose delivery issues.
SELECT a.experiment_id, a.variant_id,
COUNT(DISTINCT a.user_id) AS assigned_users,
COUNT(DISTINCT e.user_id) AS exposed_users
FROM experiment_assignments a
LEFT JOIN experiment_exposures e
ON a.experiment_id = e.experiment_id
AND a.user_id = e.user_id
GROUP BY a.experiment_id, a.variant_id;
Store outcomes such as click, purchase, or session length in an event table at the lowest useful grain. This preserves flexibility because analysts can later define different metrics, attribution windows, and guardrails without changing raw data.
SELECT user_id, event_name, event_ts, metric_value
FROM experiment_outcomes
WHERE event_name IN ('signup', 'purchase');
Experiment analysis depends on whether an outcome happened after assignment or after first exposure and within a defined window. The schema must include timestamps that make these filters explicit and reproducible in SQL.
WITH first_exposure AS (
SELECT experiment_id, user_id, MIN(exposure_ts) AS first_exposure_ts
FROM experiment_exposures
GROUP BY experiment_id, user_id
)
SELECT o.user_id, o.event_name
FROM experiment_outcomes o
JOIN first_exposure f
ON o.experiment_id = f.experiment_id
AND o.user_id = f.user_id
WHERE o.event_ts >= f.first_exposure_ts
AND o.event_ts < f.first_exposure_ts + INTERVAL '7 days';
Good schema design uses primary keys, foreign keys, uniqueness constraints, and event identifiers to prevent double counting and invalid joins. Without these controls, experiment metrics can be wrong even if the SQL query looks correct.
ALTER TABLE experiment_assignments
ADD CONSTRAINT uq_assignment UNIQUE (experiment_id, user_id);
ALTER TABLE experiment_outcomes
ADD CONSTRAINT uq_outcome_event UNIQUE (event_id);