Context
Aircall is testing a new onboarding prompt inside the Aircall Workspace that encourages newly invited teammates to connect their number, install the desktop app, and place their first call. After an initial readout, the PM says the treatment appears positive in France but flat in North America, and asks whether to ship globally, ship regionally, or keep testing.
Hypothesis Seed
The team believes a localized onboarding prompt with clearer setup steps will increase 7-day activation for newly invited seats by reducing setup friction. However, regional differences in language, call workflows, and sales-assist motion may cause heterogeneous treatment effects.
Constraints
- Eligible traffic: 18,000 newly invited seats per week across paid Aircall workspaces
- Region mix: 40% France, 45% North America, 15% Rest of Europe
- Baseline 7-day activation rate: 32% in France, 28% in North America
- Maximum experiment runtime: 4 weeks before the onboarding team must decide on rollout
- False positive cost: medium-high, because shipping a weak prompt globally adds UX clutter and engineering maintenance
- False negative cost: medium, because delaying a real activation improvement slows seat adoption and downstream calling volume
Deliverables
- Define the primary metric, 2-4 guardrails, and an explicit MDE for the overall test and explain whether region-level effects are confirmatory or exploratory.
- Calculate the sample size needed and assess whether the test is powered for the global effect, each region separately, or only the largest regions within the 4-week limit.
- Choose the unit of randomization, allocation, duration, and any stratification/blocking. Explain how you would handle regional heterogeneity in the design.
- Write a pre-registered analysis plan covering the main test, treatment-by-region interaction, multiple-comparison policy, and peeking policy.
- Explain how you would interpret a result where the experiment shows a lift in France but not in North America, including what additional checks you would run before recommending global ship, regional ship, iterate, or no-ship.