Regional Lift in Aircall Test

Medium

A/B Testing & Experimentation

Asked at 1 company1A/B TestingExperimentationCausal Inference

Also asked at

Problem

Context

Aircall is testing a new onboarding prompt inside the Aircall Workspace that encourages newly invited teammates to connect their number, install the desktop app, and place their first call. After an initial readout, the PM says the treatment appears positive in France but flat in North America, and asks whether to ship globally, ship regionally, or keep testing.

Hypothesis Seed

The team believes a localized onboarding prompt with clearer setup steps will increase 7-day activation for newly invited seats by reducing setup friction. However, regional differences in language, call workflows, and sales-assist motion may cause heterogeneous treatment effects.

Constraints

Eligible traffic: 18,000 newly invited seats per week across paid Aircall workspaces
Region mix: 40% France, 45% North America, 15% Rest of Europe
Baseline 7-day activation rate: 32% in France, 28% in North America
Maximum experiment runtime: 4 weeks before the onboarding team must decide on rollout
False positive cost: medium-high, because shipping a weak prompt globally adds UX clutter and engineering maintenance
False negative cost: medium, because delaying a real activation improvement slows seat adoption and downstream calling volume

Deliverables

Define the primary metric, 2-4 guardrails, and an explicit MDE for the overall test and explain whether region-level effects are confirmatory or exploratory.
Calculate the sample size needed and assess whether the test is powered for the global effect, each region separately, or only the largest regions within the 4-week limit.
Choose the unit of randomization, allocation, duration, and any stratification/blocking. Explain how you would handle regional heterogeneity in the design.
Write a pre-registered analysis plan covering the main test, treatment-by-region interaction, multiple-comparison policy, and peeking policy.
Explain how you would interpret a result where the experiment shows a lift in France but not in North America, including what additional checks you would run before recommending global ship, regional ship, iterate, or no-ship.

Problem

Context

Hypothesis Seed

Constraints

Eligible traffic: 18,000 newly invited seats per week across paid Aircall workspaces
Region mix: 40% France, 45% North America, 15% Rest of Europe
Baseline 7-day activation rate: 32% in France, 28% in North America
Maximum experiment runtime: 4 weeks before the onboarding team must decide on rollout
False positive cost: medium-high, because shipping a weak prompt globally adds UX clutter and engineering maintenance
False negative cost: medium, because delaying a real activation improvement slows seat adoption and downstream calling volume

Deliverables

Define the primary metric, 2-4 guardrails, and an explicit MDE for the overall test and explain whether region-level effects are confirmatory or exploratory.
Calculate the sample size needed and assess whether the test is powered for the global effect, each region separately, or only the largest regions within the 4-week limit.
Choose the unit of randomization, allocation, duration, and any stratification/blocking. Explain how you would handle regional heterogeneity in the design.
Write a pre-registered analysis plan covering the main test, treatment-by-region interaction, multiple-comparison policy, and peeking policy.
Explain how you would interpret a result where the experiment shows a lift in France but not in North America, including what additional checks you would run before recommending global ship, regional ship, iterate, or no-ship.

Your answer

Try one AI text evaluation on us

Get structured feedback, scored against a 4-axis rubric. Premium unlocks unlimited.

0 wordstarget ~200

Up next

ACompeting Aircall Onboarding ExperimentsHard

Retention Lift A/B Test DesignMedium

Launch Small-Lift Checkout FeatureMedium

Next question

Context

Constraints

Eligible traffic: 18,000 newly invited seats per week across paid Aircall workspaces

Region mix: 40% France, 45% North America, 15% Rest of Europe

Baseline 7-day activation rate: 32% in France, 28% in North America

Maximum experiment runtime: 4 weeks before the onboarding team must decide on rollout

False positive cost: medium-high, because shipping a weak prompt globally adds UX clutter and engineering maintenance

False negative cost: medium, because delaying a real activation improvement slows seat adoption and downstream calling volume

Deliverables

Define the primary metric, 2-4 guardrails, and an explicit MDE for the overall test and explain whether region-level effects are confirmatory or exploratory.

Calculate the sample size needed and assess whether the test is powered for the global effect, each region separately, or only the largest regions within the 4-week limit.

Choose the unit of randomization, allocation, duration, and any stratification/blocking. Explain how you would handle regional heterogeneity in the design.

Write a pre-registered analysis plan covering the main test, treatment-by-region interaction, multiple-comparison policy, and peeking policy.

Explain how you would interpret a result where the experiment shows a lift in France but not in North America, including what additional checks you would run before recommending global ship, regional ship, iterate, or no-ship.

Context

Constraints

Eligible traffic: 18,000 newly invited seats per week across paid Aircall workspaces

Region mix: 40% France, 45% North America, 15% Rest of Europe

Baseline 7-day activation rate: 32% in France, 28% in North America

Maximum experiment runtime: 4 weeks before the onboarding team must decide on rollout

False positive cost: medium-high, because shipping a weak prompt globally adds UX clutter and engineering maintenance

False negative cost: medium, because delaying a real activation improvement slows seat adoption and downstream calling volume

Deliverables

Define the primary metric, 2-4 guardrails, and an explicit MDE for the overall test and explain whether region-level effects are confirmatory or exploratory.

Calculate the sample size needed and assess whether the test is powered for the global effect, each region separately, or only the largest regions within the 4-week limit.

Choose the unit of randomization, allocation, duration, and any stratification/blocking. Explain how you would handle regional heterogeneity in the design.

Write a pre-registered analysis plan covering the main test, treatment-by-region interaction, multiple-comparison policy, and peeking policy.