You’re a data scientist at RideShield, a rideshare + insurance partner that prices pay-as-you-drive policies for ~600k active drivers across the US. The company has a “driver safety score” model that ingests telematics events (hard brakes, rapid accelerations, speeding) and outputs a daily risk score used to set premiums and trigger coaching.
A problem surfaced: drivers in San Francisco (SF) are consistently scoring worse than drivers in Phoenix (PHX), and the operations team suspects this is partly due to environmental differences (dense traffic, hills, lower average speeds, more stop-and-go) rather than true driver skill. Leadership wants a normalization approach so that a “good driver” in SF is comparable to a “good driver” in PHX.
You are given a simplified dataset aggregated at the driver-month level. For each driver-month you have:
events: number of safety-relevant events (hard brake OR harsh accel OR speeding)miles: miles driven that monthnight_miles_share: fraction of miles driven at night (0–1)rain_hours: total hours of rain encountered while driving that monthThe business wants a single normalized metric that can be used to compare drivers across cities and to decide who gets coaching.
Assume you sampled a large set of driver-months from each city and computed the following summary statistics for the raw event rate per 100 miles:
| City | Driver-months (n) | Mean miles per driver-month | Mean events per driver-month | Mean raw rate (events/100 mi) | SD of raw rate |
|---|---|---|---|---|---|
| SF | 18,420 | 612.7 | 19.8 | 3.23 | 1.41 |
| PHX | 16,050 | 701.4 | 15.2 | 2.17 | 1.02 |
You also fit (from historical data across all cities) a Poisson GLM for event counts with exposure (miles) and covariates:
Estimated coefficients (treat as known for this question):
| Parameter | Estimate |
|---|---|
| -3.650 | |
| 0.780 | |
| 0.035 |
For a particular driver-month you want to score:
| City | miles | events | night_miles_share | rain_hours |
|---|---|---|---|---|
| SF | 640 | 22 | 0.28 | 14 |
Use significance level .
Design and compute a normalization method that makes driving performance comparable across cities. You should address both simple distributional normalization and a model-based “expected vs observed” normalization that accounts for driving conditions.