You work on a live ops feature in a mobile game and want to run an A/B test before rolling it out broadly. The team is asking how large the experiment should be and what minimum effect is worth detecting, since traffic is limited and the feature could also affect player experience outside the main metric.
How do you choose the right sample size or MDE for this experiment, and what would you pre-register before launch so the analysis is defensible? Include how you would define the primary metric, guardrails, and the ship/don't-ship rule.
You work on a live ops feature in a mobile game and want to run an A/B test before rolling it out broadly. The team is asking how large the experiment should be and what minimum effect is worth detecting, since traffic is limited and the feature could also affect player experience outside the main metric.
How do you choose the right sample size or MDE for this experiment, and what would you pre-register before launch so the analysis is defensible? Include how you would define the primary metric, guardrails, and the ship/don't-ship rule.