Context
Uber Eats is testing a new courier dispatch policy in a mid-sized city. The policy prioritizes treated couriers for nearby batched offers, which may improve fulfillment efficiency for treated trips but can also change marketplace conditions for untreated couriers and merchants.
Hypothesis Seed
The Growth team believes the new dispatch policy will increase completed trips per online courier-hour and reduce courier wait time by matching treated couriers more efficiently. The challenge is that standard user-level randomization may create interference: control couriers operate in the same marketplace and can be indirectly harmed or helped by treated couriers receiving different dispatches.
Constraints
- Eligible market: 1 city with about 120,000 completed Eats trips per day
- Active supply: ~9,000 courier online-hours per day and ~4,500 merchants/day
- Maximum experiment window: 28 days; leadership wants a ship decision in 4 weeks
- False positives are costly because a bad dispatch policy can hurt merchant prep flow and courier earnings; false negatives are also costly because the city is nearing peak seasonal demand
- You may assume baseline completed trips per courier-hour is 1.80 with standard deviation 1.10
- You may assume baseline merchant cancellation rate is 2.4% and baseline courier earnings per online hour is $24.00
Deliverables
- Propose an experiment design that explicitly addresses interference in the Uber Eats marketplace, including the unit of randomization and whether you would use geo-clusters, courier-level assignment, or a switchback design.
- Define the primary metric, 2-4 guardrails, and a clear MDE. Be explicit about the unit of analysis.
- Calculate the required sample size using the numbers above and translate it into a feasible duration under your proposed design.
- Pre-register the analysis plan: statistical test, peeking policy, multiple-comparisons treatment, and how you will detect sample ratio mismatch or contamination.
- State a ship / don’t-ship / iterate rule that respects guardrails, and explain how you would interpret a result if the primary metric improves but control is clearly affected by treatment spillovers.