You work on an HR software platform and your team has redesigned the employee onboarding checklist shown to new admins during company setup. The team believes the new checklist will increase the rate at which admins complete payroll setup, but the baseline conversion rate is noisy because companies arrive with very different levels of pre-existing setup progress. You want to use CUPED with pre-experiment covariates to improve sensitivity and detect a smaller effect without extending the test.
How would you design this experiment so that CUPED is used appropriately to improve power while keeping the analysis valid? Explain how you would choose the covariate, define success and guardrails, size the test, and decide whether to ship.