You work on a consumer knowledge platform and are testing a new ranking treatment on the home feed that reorders answer recommendations to increase answer-upvote rate. The team believes the change should improve engagement because it surfaces fresher and more personalized content. After 10 days, the treatment shows a statistically significant lift on the primary metric, but the observed traffic split is 57/43 instead of the planned 50/50. Product asks whether you can still trust the result and ship.
How would you design and analyze this experiment, and how would you handle the case where the primary metric is positive but the sample ratio is off? Explain what you would pre-register, how you would diagnose the mismatch, and under what conditions you would ship, rerun, or invalidate the test.