You’re a data scientist at TaskRabbit, a two-sided marketplace with ~3.5M monthly active customers and 220k active Taskers across North America. The pricing team proposes a +4% increase to the customer-facing hourly price (implemented via a higher service fee) to improve contribution margin. Ops is worried about operational fallout: if Taskers see fewer bookings (or lower effective hourly earnings), they may churn, causing longer wait times and cancellations.
A 14-day randomized experiment was run in 6 large metros. Customers were randomized at the customer-id level to see either the old price (control) or the +4% price (treatment). Taskers were not explicitly randomized; they receive jobs based on matching/availability, but the marketplace is large enough that the team assumes interference is limited.
Primary metric: booking conversion rate (a customer who views a task page and completes a booking within 24 hours).
Guardrail metric: 7-day Tasker churn (a Tasker who was active in the prior 28 days and then has zero task acceptances in the subsequent 7 days).
| Metric | Control | Treatment | Notes |
|---|---|---|---|
| Customer task-page viewers (n) | 182,640 | 181,955 | Unique customers with at least one task-page view |
| Customer bookings (x) | 27,396 | 26,301 | Booking within 24h |
| Taskers at risk of churn (m) | 18,420 | 18,510 | Taskers who received ≥1 eligible lead during test |
| Tasker churn events (y) | 1,468 | 1,612 | Churn definition above |
| Significance level (α) | - | - | 0.05 |
Assume you can use large-sample normal approximations for proportions.
You need to decide whether the +4% price change should be rolled out, balancing statistical significance with operational risk. Specifically, you must quantify:
Bookings impact (two-proportion z-test):
Tasker churn guardrail (one-sided test):
Operational decision framing:
Power / sample sizing: