You work on an HR software product and your team has redesigned the employee payroll onboarding flow to reduce setup friction for admins. The team believes the new flow will increase completed payroll setups by making required steps clearer, but PMs are already asking to check results every day because payroll launches are high visibility. You need to design an experiment that can answer whether the redesign should ship without inflating false positives from repeated looks at the data.
How would you design and analyze this experiment so that interim result checks do not invalidate the conclusion, and how would you decide whether to ship if stakeholders want updates before the test is complete?