Context
BCG Digital Ventures is testing a new referral prompt inside Venture Studio Connect, an internal collaboration and portfolio-network surface where founders and operators can invite peers into deal-room discussions. Because invited users can influence each other, a standard user-level A/B test may violate independence assumptions.
Hypothesis Seed
The growth team wants to test a redesigned referral prompt that highlights mutual connections and recent portfolio activity. They believe it will increase weekly invite acceptance, but they are concerned that treated users may affect untreated users through shared team spaces, cross-invites, and message forwarding.
Constraints
- Eligible traffic: 120,000 active users per week across 6,000 collaboration teams
- Average team size: 20 users; 35% of users belong to more than one team
- Maximum experiment window: 28 days; leadership needs a ship/no-ship decision this month
- False positives are costly because a bad rollout could degrade trust and spam teams; false negatives are acceptable if the design is ambiguous
- Engineering can support either user-level randomization or team-level randomization, but not both simultaneously
Deliverables
- Define the null and alternative hypotheses, including whether you would use a one-sided or two-sided test.
- Propose the experiment design, including the unit of randomization, allocation, duration, and any stratification or clustering strategy needed to handle interference.
- Specify the primary metric, 2-4 guardrail metrics, and at least one explicit MDE. Explain why the metric definitions fit this setting.
- Calculate the required sample size using real numbers and translate it into expected runtime under the available traffic. If clustering changes the effective sample size, show that adjustment.
- Pre-register an analysis plan: statistical test, peeking policy, multiple-comparison handling, SRM checks, and a clear ship / don’t-ship rule that respects guardrails.
Assume the current weekly invite-acceptance rate is 18% at the exposed-user level, and the team considers a lift smaller than 2 percentage points absolute too small to justify rollout risk.