North Star for Pedestrian Detection Safety

Business Context

You’re the analytics lead embedded with Waymo’s Perception team, specifically the pedestrian detection squad powering an L4 autonomous ride-hailing service operating in Phoenix and San Francisco. The fleet drives ~1.8M autonomous miles/week and completes ~220k rider trips/week. Safety regulators and city partners require transparent reporting, and internally the company has a quarterly goal to reduce safety-critical events without degrading rider experience (e.g., excessive hard braking).

Over the last month, the team shipped two changes: (1) a new camera model with different noise characteristics and (2) a model update that improved offline mAP on a benchmark dataset. However, the on-road safety review board is concerned: simulated “near-miss” events increased 9% week-over-week in the closed-loop simulator, while on-road disengagements stayed flat. Engineering argues the model is “better” per offline metrics; Safety argues the real-world risk may be rising.

Your PM asks you to define a single primary metric (a North Star KPI) for the pedestrian detection team that can be used in weekly business reviews and release gates. The metric must: (a) align with real-world safety risk, (b) be measurable with available data at scale, (c) be robust to changes in fleet mix and geography, and (d) be hard to game by simply changing thresholds.

Data Available

Source	What it contains	Granularity
`perception_detections`	model version, timestamp, object_id, class=pedestrian, confidence score, 3D box, track continuity	per frame / per track
`sensor_fusion_groundtruth`	human-labeled clips for sampled miles; pedestrian presence, location, occlusion, distance	per clip
`planner_events`	TTC (time-to-collision), hard brake, yield/stop decisions, predicted trajectories	per event
`autonomy_disengagements`	disengagement reason codes, speed, location, preceding objects	per disengagement
`fleet_miles`	miles driven by city, time of day, weather proxy, road type	per trip / per segment
`sim_closed_loop_results`	scenario id, near-miss flags, collision flags, TTC distributions	per simulation run

What You Need To Produce

Define the primary metric for pedestrian detection (one metric, not a dashboard) that you would put in the release gate.
Specify exact inclusion/exclusion rules (e.g., what counts as a pedestrian, distance bands, occlusion, stationary vs moving, construction workers).
Provide the calculation formula and how you would normalize it (per mile, per encounter, per pedestrian-crossing event, etc.).
Propose a decomposition that lets the team diagnose movement in the primary metric into actionable drivers.
Provide benchmarks/targets: what is “good,” what is “regression,” and what triggers a rollback.
List guardrails you would monitor to ensure improving the metric doesn’t harm rider experience or create new safety issues.

Constraints:

Labels are expensive: only ~0.5% of miles are human-labeled each week.
The metric must be stable enough to compare week-over-week despite changes in route mix (downtown vs suburban).
You must support both on-road and simulation evaluation, but the KPI should reflect real-world risk.

Business Context

Data Available

Source	What it contains	Granularity
`perception_detections`	model version, timestamp, object_id, class=pedestrian, confidence score, 3D box, track continuity	per frame / per track
`sensor_fusion_groundtruth`	human-labeled clips for sampled miles; pedestrian presence, location, occlusion, distance	per clip
`planner_events`	TTC (time-to-collision), hard brake, yield/stop decisions, predicted trajectories	per event
`autonomy_disengagements`	disengagement reason codes, speed, location, preceding objects	per disengagement
`fleet_miles`	miles driven by city, time of day, weather proxy, road type	per trip / per segment
`sim_closed_loop_results`	scenario id, near-miss flags, collision flags, TTC distributions	per simulation run

What You Need To Produce

Define the primary metric for pedestrian detection (one metric, not a dashboard) that you would put in the release gate.
Specify exact inclusion/exclusion rules (e.g., what counts as a pedestrian, distance bands, occlusion, stationary vs moving, construction workers).
Provide the calculation formula and how you would normalize it (per mile, per encounter, per pedestrian-crossing event, etc.).
Propose a decomposition that lets the team diagnose movement in the primary metric into actionable drivers.
Provide benchmarks/targets: what is “good,” what is “regression,” and what triggers a rollback.
List guardrails you would monitor to ensure improving the metric doesn’t harm rider experience or create new safety issues.

Constraints:

Labels are expensive: only ~0.5% of miles are human-labeled each week.
The metric must be stable enough to compare week-over-week despite changes in route mix (downtown vs suburban).
You must support both on-road and simulation evaluation, but the KPI should reflect real-world risk.

Business Context

Data Available

Source	What it contains	Granularity
`perception_detections`	model version, timestamp, object_id, class=pedestrian, confidence score, 3D box, track continuity	per frame / per track
`sensor_fusion_groundtruth`	human-labeled clips for sampled miles; pedestrian presence, location, occlusion, distance	per clip
`planner_events`	TTC (time-to-collision), hard brake, yield/stop decisions, predicted trajectories	per event
`autonomy_disengagements`	disengagement reason codes, speed, location, preceding objects	per disengagement
`fleet_miles`	miles driven by city, time of day, weather proxy, road type	per trip / per segment
`sim_closed_loop_results`	scenario id, near-miss flags, collision flags, TTC distributions	per simulation run

What You Need To Produce

Define the primary metric for pedestrian detection (one metric, not a dashboard) that you would put in the release gate.
Specify exact inclusion/exclusion rules (e.g., what counts as a pedestrian, distance bands, occlusion, stationary vs moving, construction workers).
Provide the calculation formula and how you would normalize it (per mile, per encounter, per pedestrian-crossing event, etc.).
Propose a decomposition that lets the team diagnose movement in the primary metric into actionable drivers.
Provide benchmarks/targets: what is “good,” what is “regression,” and what triggers a rollback.
List guardrails you would monitor to ensure improving the metric doesn’t harm rider experience or create new safety issues.

Constraints:

Labels are expensive: only ~0.5% of miles are human-labeled each week.
The metric must be stable enough to compare week-over-week despite changes in route mix (downtown vs suburban).
You must support both on-road and simulation evaluation, but the KPI should reflect real-world risk.

Business Context

Data Available

Source	What it contains	Granularity
`perception_detections`	model version, timestamp, object_id, class=pedestrian, confidence score, 3D box, track continuity	per frame / per track
`sensor_fusion_groundtruth`	human-labeled clips for sampled miles; pedestrian presence, location, occlusion, distance	per clip
`planner_events`	TTC (time-to-collision), hard brake, yield/stop decisions, predicted trajectories	per event
`autonomy_disengagements`	disengagement reason codes, speed, location, preceding objects	per disengagement
`fleet_miles`	miles driven by city, time of day, weather proxy, road type	per trip / per segment
`sim_closed_loop_results`	scenario id, near-miss flags, collision flags, TTC distributions	per simulation run

What You Need To Produce

Define the primary metric for pedestrian detection (one metric, not a dashboard) that you would put in the release gate.
Specify exact inclusion/exclusion rules (e.g., what counts as a pedestrian, distance bands, occlusion, stationary vs moving, construction workers).
Provide the calculation formula and how you would normalize it (per mile, per encounter, per pedestrian-crossing event, etc.).
Propose a decomposition that lets the team diagnose movement in the primary metric into actionable drivers.
Provide benchmarks/targets: what is “good,” what is “regression,” and what triggers a rollback.
List guardrails you would monitor to ensure improving the metric doesn’t harm rider experience or create new safety issues.

Constraints:

Labels are expensive: only ~0.5% of miles are human-labeled each week.
The metric must be stable enough to compare week-over-week despite changes in route mix (downtown vs suburban).
You must support both on-road and simulation evaluation, but the KPI should reflect real-world risk.

Interview Guides

Business Context

Data Available

What You Need To Produce

North Star for Pedestrian Detection Safety

Business Context

Data Available

What You Need To Produce

Your Answer

North Star for Pedestrian Detection Safety

Business Context

Data Available

What You Need To Produce

North Star for Pedestrian Detection Safety

Business Context

Data Available

What You Need To Produce

Your Answer