You are evaluating a retail growth initiative that could affect both digital behavior and in-store outcomes. The intervention may be targeted at individual users, but some effects could spill across households, stores, or regions, making it unclear whether a holdout test, geo test, or user-level A/B test is the right design.
How would you decide whether to use a holdout test, geo test, or user-level A/B test for this initiative? Walk through how you would choose the unit of randomization, define success metrics and guardrails, and determine whether the design is sufficiently powered to support a launch decision.