Evaluate New Ad-Ranking Model

Context

AdSphere runs sponsored listings in a large marketplace app. The ads team has trained a new ad-ranking model that is expected to improve ad quality and monetization, and wants a rigorous online experiment before launch.

Hypothesis Seed

The new model uses richer user-context and conversion features, and the team believes it will increase revenue per 1,000 ad impressions by showing more relevant ads. However, leadership is concerned that a model optimized too aggressively for revenue could hurt user engagement, advertiser fairness, or system latency.

Constraints

Eligible traffic: 12 million ad impressions/day across homepage, search, and feed surfaces
Average baseline click-through rate (CTR): 1.8%
Average baseline revenue per 1,000 impressions (RPM): $24.00, with impression-level standard deviation approximately $120 when expressed on a per-1,000-impression normalized basis
Maximum experiment duration: 14 days
Randomization must be chosen to avoid users seeing inconsistent ad ordering within a session
False positives are expensive because a bad launch can reduce marketplace trust and advertiser ROI; false negatives are acceptable but should be minimized

Deliverables

Define the experiment hypothesis, including the primary metric, 2-4 guardrails, and a clear minimum detectable effect (MDE).
Compute the required sample size and estimate whether the test can be completed within the 14-day traffic budget.
Choose the unit of randomization, allocation, and duration; explain trade-offs such as user-level consistency, interference, and variance.
Pre-register an analysis plan: statistical test, treatment of multiple metrics, peeking policy, and how you will check for sample ratio mismatch.
State a clear ship / don’t ship / iterate rule that respects both the primary metric and guardrails.

Assume you may use asymptotic approximations, but your design should be robust enough for production decision-making. Be explicit about how you would handle pitfalls such as novelty effects, network interference from advertisers adapting bids, and any mismatch between unit of randomization and unit of analysis.

Context

Hypothesis Seed

Constraints

Eligible traffic: 12 million ad impressions/day across homepage, search, and feed surfaces
Average baseline click-through rate (CTR): 1.8%
Average baseline revenue per 1,000 impressions (RPM): $24.00, with impression-level standard deviation approximately $120 when expressed on a per-1,000-impression normalized basis
Maximum experiment duration: 14 days
Randomization must be chosen to avoid users seeing inconsistent ad ordering within a session
False positives are expensive because a bad launch can reduce marketplace trust and advertiser ROI; false negatives are acceptable but should be minimized

Deliverables

Define the experiment hypothesis, including the primary metric, 2-4 guardrails, and a clear minimum detectable effect (MDE).
Compute the required sample size and estimate whether the test can be completed within the 14-day traffic budget.
Choose the unit of randomization, allocation, and duration; explain trade-offs such as user-level consistency, interference, and variance.
Pre-register an analysis plan: statistical test, treatment of multiple metrics, peeking policy, and how you will check for sample ratio mismatch.
State a clear ship / don’t ship / iterate rule that respects both the primary metric and guardrails.

Context

Hypothesis Seed

Constraints

Eligible traffic: 12 million ad impressions/day across homepage, search, and feed surfaces
Average baseline click-through rate (CTR): 1.8%
Average baseline revenue per 1,000 impressions (RPM): $24.00, with impression-level standard deviation approximately $120 when expressed on a per-1,000-impression normalized basis
Maximum experiment duration: 14 days
Randomization must be chosen to avoid users seeing inconsistent ad ordering within a session
False positives are expensive because a bad launch can reduce marketplace trust and advertiser ROI; false negatives are acceptable but should be minimized

Deliverables

Define the experiment hypothesis, including the primary metric, 2-4 guardrails, and a clear minimum detectable effect (MDE).
Compute the required sample size and estimate whether the test can be completed within the 14-day traffic budget.
Choose the unit of randomization, allocation, and duration; explain trade-offs such as user-level consistency, interference, and variance.
Pre-register an analysis plan: statistical test, treatment of multiple metrics, peeking policy, and how you will check for sample ratio mismatch.
State a clear ship / don’t ship / iterate rule that respects both the primary metric and guardrails.

Context

Hypothesis Seed

Constraints

Eligible traffic: 12 million ad impressions/day across homepage, search, and feed surfaces
Average baseline click-through rate (CTR): 1.8%
Average baseline revenue per 1,000 impressions (RPM): $24.00, with impression-level standard deviation approximately $120 when expressed on a per-1,000-impression normalized basis
Maximum experiment duration: 14 days
Randomization must be chosen to avoid users seeing inconsistent ad ordering within a session
False positives are expensive because a bad launch can reduce marketplace trust and advertiser ROI; false negatives are acceptable but should be minimized

Deliverables

Define the experiment hypothesis, including the primary metric, 2-4 guardrails, and a clear minimum detectable effect (MDE).
Compute the required sample size and estimate whether the test can be completed within the 14-day traffic budget.
Choose the unit of randomization, allocation, and duration; explain trade-offs such as user-level consistency, interference, and variance.
Pre-register an analysis plan: statistical test, treatment of multiple metrics, peeking policy, and how you will check for sample ratio mismatch.
State a clear ship / don’t ship / iterate rule that respects both the primary metric and guardrails.

Interview Guides

Context

Hypothesis Seed

Constraints

Deliverables

Evaluate New Ad-Ranking Model

Context

Hypothesis Seed

Constraints

Deliverables

Your Answer

Evaluate New Ad-Ranking Model

Context

Hypothesis Seed

Constraints

Deliverables

Evaluate New Ad-Ranking Model

Context

Hypothesis Seed

Constraints

Deliverables

Your Answer