Experiment Design Under User Spillovers

Context

BCG Digital Ventures is testing a new referral prompt inside Venture Studio Connect, an internal collaboration and portfolio-network surface where founders and operators can invite peers into deal-room discussions. Because invited users can influence each other, a standard user-level A/B test may violate independence assumptions.

Hypothesis Seed

The growth team wants to test a redesigned referral prompt that highlights mutual connections and recent portfolio activity. They believe it will increase weekly invite acceptance, but they are concerned that treated users may affect untreated users through shared team spaces, cross-invites, and message forwarding.

Constraints

Eligible traffic: 120,000 active users per week across 6,000 collaboration teams
Average team size: 20 users; 35% of users belong to more than one team
Maximum experiment window: 28 days; leadership needs a ship/no-ship decision this month
False positives are costly because a bad rollout could degrade trust and spam teams; false negatives are acceptable if the design is ambiguous
Engineering can support either user-level randomization or team-level randomization, but not both simultaneously

Deliverables

Define the null and alternative hypotheses, including whether you would use a one-sided or two-sided test.
Propose the experiment design, including the unit of randomization, allocation, duration, and any stratification or clustering strategy needed to handle interference.
Specify the primary metric, 2-4 guardrail metrics, and at least one explicit MDE. Explain why the metric definitions fit this setting.
Calculate the required sample size using real numbers and translate it into expected runtime under the available traffic. If clustering changes the effective sample size, show that adjustment.
Pre-register an analysis plan: statistical test, peeking policy, multiple-comparison handling, SRM checks, and a clear ship / don’t-ship rule that respects guardrails.

Assume the current weekly invite-acceptance rate is 18% at the exposed-user level, and the team considers a lift smaller than 2 percentage points absolute too small to justify rollout risk.

Context

Hypothesis Seed

Constraints

Eligible traffic: 120,000 active users per week across 6,000 collaboration teams
Average team size: 20 users; 35% of users belong to more than one team
Maximum experiment window: 28 days; leadership needs a ship/no-ship decision this month
False positives are costly because a bad rollout could degrade trust and spam teams; false negatives are acceptable if the design is ambiguous
Engineering can support either user-level randomization or team-level randomization, but not both simultaneously

Deliverables

Define the null and alternative hypotheses, including whether you would use a one-sided or two-sided test.
Propose the experiment design, including the unit of randomization, allocation, duration, and any stratification or clustering strategy needed to handle interference.
Specify the primary metric, 2-4 guardrail metrics, and at least one explicit MDE. Explain why the metric definitions fit this setting.
Calculate the required sample size using real numbers and translate it into expected runtime under the available traffic. If clustering changes the effective sample size, show that adjustment.
Pre-register an analysis plan: statistical test, peeking policy, multiple-comparison handling, SRM checks, and a clear ship / don’t-ship rule that respects guardrails.

Assume the current weekly invite-acceptance rate is 18% at the exposed-user level, and the team considers a lift smaller than 2 percentage points absolute too small to justify rollout risk.

Context

Hypothesis Seed

Constraints

Eligible traffic: 120,000 active users per week across 6,000 collaboration teams
Average team size: 20 users; 35% of users belong to more than one team
Maximum experiment window: 28 days; leadership needs a ship/no-ship decision this month
False positives are costly because a bad rollout could degrade trust and spam teams; false negatives are acceptable if the design is ambiguous
Engineering can support either user-level randomization or team-level randomization, but not both simultaneously

Deliverables

Define the null and alternative hypotheses, including whether you would use a one-sided or two-sided test.
Propose the experiment design, including the unit of randomization, allocation, duration, and any stratification or clustering strategy needed to handle interference.
Specify the primary metric, 2-4 guardrail metrics, and at least one explicit MDE. Explain why the metric definitions fit this setting.
Calculate the required sample size using real numbers and translate it into expected runtime under the available traffic. If clustering changes the effective sample size, show that adjustment.
Pre-register an analysis plan: statistical test, peeking policy, multiple-comparison handling, SRM checks, and a clear ship / don’t-ship rule that respects guardrails.

Assume the current weekly invite-acceptance rate is 18% at the exposed-user level, and the team considers a lift smaller than 2 percentage points absolute too small to justify rollout risk.

Context

Hypothesis Seed

Constraints

Eligible traffic: 120,000 active users per week across 6,000 collaboration teams
Average team size: 20 users; 35% of users belong to more than one team
Maximum experiment window: 28 days; leadership needs a ship/no-ship decision this month
False positives are costly because a bad rollout could degrade trust and spam teams; false negatives are acceptable if the design is ambiguous
Engineering can support either user-level randomization or team-level randomization, but not both simultaneously

Deliverables

Define the null and alternative hypotheses, including whether you would use a one-sided or two-sided test.
Propose the experiment design, including the unit of randomization, allocation, duration, and any stratification or clustering strategy needed to handle interference.
Specify the primary metric, 2-4 guardrail metrics, and at least one explicit MDE. Explain why the metric definitions fit this setting.
Calculate the required sample size using real numbers and translate it into expected runtime under the available traffic. If clustering changes the effective sample size, show that adjustment.
Pre-register an analysis plan: statistical test, peeking policy, multiple-comparison handling, SRM checks, and a clear ship / don’t-ship rule that respects guardrails.

Assume the current weekly invite-acceptance rate is 18% at the exposed-user level, and the team considers a lift smaller than 2 percentage points absolute too small to justify rollout risk.

Interview Guides

Context

Hypothesis Seed

Constraints

Deliverables

Experiment Design Under User Spillovers

Context

Hypothesis Seed

Constraints

Deliverables

Your Answer

Experiment Design Under User Spillovers

Context

Hypothesis Seed

Constraints

Deliverables

Experiment Design Under User Spillovers

Context

Hypothesis Seed

Constraints

Deliverables

Your Answer