Business Context
WayDrive operates a Level-4 autonomous ride-hailing fleet in Phoenix and Seattle, logging ~3.5M autonomous miles per month. Safety leadership is preparing a regulator-facing report and wants a statistically defensible answer to: does the vehicle disengagement rate (human takeover per mile) materially worsen in adverse weather?
A disengagement is a rare event, and exposure differs by weather (the fleet drives far fewer miles in snow than in clear conditions). You also suspect confounding: night driving is more common during rain in Seattle, and night itself may increase disengagement risk.
Problem Statement
Using the aggregated telemetry below, (1) test whether disengagement rates differ across weather conditions, (2) estimate the size of the effect with confidence intervals, and (3) propose a model-based approach that adjusts for a key confounder (night vs day). You should justify which statistical tests you choose and why they fit the data-generating process.
Given Data
Aggregated by weather (all cities combined)
| Weather | Autonomous miles (exposure) | Disengagements (count) |
|---|
| Clear | 1,200,000 | 96 |
| Rain | 300,000 | 45 |
| Snow | 50,000 | 20 |
Assume disengagements are independent conditional on exposure, and that within each weather bucket the rate is approximately constant.
Stratified for confounding check (Clear vs Rain only)
| Weather | Time | Miles | Disengagements |
|---|
| Clear | Day | 800,000 | 56 |
| Clear | Night | 400,000 | 40 |
| Rain | Day | 120,000 | 12 |
| Rain | Night | 180,000 | 33 |
Use significance level α = 0.05.
Requirements
- Compute disengagement rates per 100,000 miles for each weather condition.
- Overall hypothesis test (3 groups): Choose an appropriate test to evaluate whether rates differ across Clear/Rain/Snow, and carry it out.
- Targeted comparison: Test Rain vs Clear using a rate-based test (not a proportion test). Report the rate ratio and a 95% CI.
- Model with confounding adjustment: Using the stratified table, fit/describe a regression that adjusts for night vs day while estimating the Rain effect. Provide the adjusted Rain-vs-Clear rate ratio.
- Power / planning: If leadership wants to detect a 25% increase in disengagement rate in Rain vs Clear with 80% power (two-sided α=0.05), approximate how many Rain miles are needed, assuming the Clear rate is the baseline and Clear miles are effectively unlimited.
Assumptions & Constraints
- Disengagement counts are rare and scale with exposure ⇒ a Poisson process (or Poisson GLM) is reasonable.
- Miles are measured without error; disengagement labeling is consistent across conditions.
- Independence may be violated by clustering (same route/vehicle). You may mention how you’d address this (e.g., cluster-robust SEs), but complete the calculations under the Poisson assumption.
- Multiple comparisons: if you do pairwise tests beyond Rain vs Clear, state how you’d control error (e.g., Holm).