Predict Pedestrian Crossing Intent On-Vehicle

Business Context

You’re on the Perception & Prediction team at MetroDrive, a robotaxi service operating in San Francisco and Phoenix. The fleet drives 2.5M autonomous miles/week. A disproportionate share of safety-critical disengagements and hard-braking events come from pedestrian interactions near crosswalks and curb edges. Product and Safety want a model that can anticipate pedestrian behavior 1–3 seconds ahead so the planner can slow early, reduce jerk, and avoid last-moment emergency braking.

The model will run on-vehicle (embedded GPU/CPU) and must be robust to occlusions, dense urban scenes, and distribution shift (new intersections, events, weather). False negatives (predict “won’t cross” when they do) are safety-critical; false positives increase unnecessary yielding and hurt ride time.

Dataset

You have logged data from the perception stack (already fused and tracked). Each example corresponds to a pedestrian track segment aligned to the ego vehicle timeline.

Component	Scale / Shape	Examples / Notes
Track sequences	1.8M sequences	20 Hz, variable length 1–6s (padded/truncated)
Kinematics (per timestep)	12 floats	x/y in ego frame, vx/vy, ax/ay, heading, yaw_rate
Scene context (static)	18 floats	distance to nearest crosswalk, curb, sidewalk; lane count; speed limit
Interaction features	10 floats	TTC to ego path, relative bearing, ego speed/accel, gap to nearest vehicle
Semantics	6 categorical	intersection type, crosswalk present, signal state (if available), time-of-day bucket
Labels	binary + time-to-event	whether pedestrian enters ego lane/crosswalk within horizon; time-to-cross (if positive)

Additional characteristics:

Positive rate: ~7% of sequences result in a crossing within 3s (highly imbalanced).
Missingness: ~12% of timesteps have partial occlusion (noisy velocity/accel); signal state missing in ~30% of scenes.
Leakage risk: tracks may include frames after the pedestrian has already stepped off the curb if you’re not careful with label alignment.

Success Criteria

Safety-first recall: At precision ≥ 0.35, achieve recall ≥ 0.85 for “cross within 3s”.
Calibrated probabilities: Expected Calibration Error (ECE) ≤ 0.03 so the planner can use probabilities as costs.
Timeliness: Median time-to-detection (first time model crosses alert threshold before crossing) ≥ 1.0s.
On-vehicle performance: p95 inference latency ≤ 20 ms per tracked pedestrian on target hardware (e.g., NVIDIA Orin), with up to 40 concurrent tracks.

Constraints

Real-time: streaming inference at 20 Hz; must support batching across tracks.
Interpretability: safety review requires feature/attribution summaries (e.g., “distance-to-curb decreasing + heading toward crosswalk”).
Robustness: handle occlusion and sensor noise; avoid brittle dependence on a single feature like signal state.
Evaluation: must avoid temporal leakage; split by geography + time (hold out intersections and weeks).

Deliverables

Define the prediction target(s): classification horizon(s) (1s/2s/3s) and optional time-to-cross regression.
Propose a model architecture and training objective that handles imbalance and produces calibrated probabilities.
Specify feature engineering for sequences and context, including handling missing/occlusion.
Describe the train/val/test split and cross-validation strategy to prevent leakage.
Provide an evaluation plan with metrics aligned to safety and planning (PR-AUC, recall@precision, calibration, time-to-detection).
Outline a production deployment plan: streaming inference, monitoring, retraining cadence, and rollback triggers.

Business Context

Dataset

You have logged data from the perception stack (already fused and tracked). Each example corresponds to a pedestrian track segment aligned to the ego vehicle timeline.

Component	Scale / Shape	Examples / Notes
Track sequences	1.8M sequences	20 Hz, variable length 1–6s (padded/truncated)
Kinematics (per timestep)	12 floats	x/y in ego frame, vx/vy, ax/ay, heading, yaw_rate
Scene context (static)	18 floats	distance to nearest crosswalk, curb, sidewalk; lane count; speed limit
Interaction features	10 floats	TTC to ego path, relative bearing, ego speed/accel, gap to nearest vehicle
Semantics	6 categorical	intersection type, crosswalk present, signal state (if available), time-of-day bucket
Labels	binary + time-to-event	whether pedestrian enters ego lane/crosswalk within horizon; time-to-cross (if positive)

Additional characteristics:

Positive rate: ~7% of sequences result in a crossing within 3s (highly imbalanced).
Missingness: ~12% of timesteps have partial occlusion (noisy velocity/accel); signal state missing in ~30% of scenes.
Leakage risk: tracks may include frames after the pedestrian has already stepped off the curb if you’re not careful with label alignment.

Success Criteria

Safety-first recall: At precision ≥ 0.35, achieve recall ≥ 0.85 for “cross within 3s”.
Calibrated probabilities: Expected Calibration Error (ECE) ≤ 0.03 so the planner can use probabilities as costs.
Timeliness: Median time-to-detection (first time model crosses alert threshold before crossing) ≥ 1.0s.
On-vehicle performance: p95 inference latency ≤ 20 ms per tracked pedestrian on target hardware (e.g., NVIDIA Orin), with up to 40 concurrent tracks.

Constraints

Real-time: streaming inference at 20 Hz; must support batching across tracks.
Interpretability: safety review requires feature/attribution summaries (e.g., “distance-to-curb decreasing + heading toward crosswalk”).
Robustness: handle occlusion and sensor noise; avoid brittle dependence on a single feature like signal state.
Evaluation: must avoid temporal leakage; split by geography + time (hold out intersections and weeks).

Deliverables

Define the prediction target(s): classification horizon(s) (1s/2s/3s) and optional time-to-cross regression.
Propose a model architecture and training objective that handles imbalance and produces calibrated probabilities.
Specify feature engineering for sequences and context, including handling missing/occlusion.
Describe the train/val/test split and cross-validation strategy to prevent leakage.
Provide an evaluation plan with metrics aligned to safety and planning (PR-AUC, recall@precision, calibration, time-to-detection).
Outline a production deployment plan: streaming inference, monitoring, retraining cadence, and rollback triggers.

Business Context

Dataset

You have logged data from the perception stack (already fused and tracked). Each example corresponds to a pedestrian track segment aligned to the ego vehicle timeline.

Component	Scale / Shape	Examples / Notes
Track sequences	1.8M sequences	20 Hz, variable length 1–6s (padded/truncated)
Kinematics (per timestep)	12 floats	x/y in ego frame, vx/vy, ax/ay, heading, yaw_rate
Scene context (static)	18 floats	distance to nearest crosswalk, curb, sidewalk; lane count; speed limit
Interaction features	10 floats	TTC to ego path, relative bearing, ego speed/accel, gap to nearest vehicle
Semantics	6 categorical	intersection type, crosswalk present, signal state (if available), time-of-day bucket
Labels	binary + time-to-event	whether pedestrian enters ego lane/crosswalk within horizon; time-to-cross (if positive)

Additional characteristics:

Positive rate: ~7% of sequences result in a crossing within 3s (highly imbalanced).
Missingness: ~12% of timesteps have partial occlusion (noisy velocity/accel); signal state missing in ~30% of scenes.
Leakage risk: tracks may include frames after the pedestrian has already stepped off the curb if you’re not careful with label alignment.

Success Criteria

Safety-first recall: At precision ≥ 0.35, achieve recall ≥ 0.85 for “cross within 3s”.
Calibrated probabilities: Expected Calibration Error (ECE) ≤ 0.03 so the planner can use probabilities as costs.
Timeliness: Median time-to-detection (first time model crosses alert threshold before crossing) ≥ 1.0s.
On-vehicle performance: p95 inference latency ≤ 20 ms per tracked pedestrian on target hardware (e.g., NVIDIA Orin), with up to 40 concurrent tracks.

Constraints

Real-time: streaming inference at 20 Hz; must support batching across tracks.
Interpretability: safety review requires feature/attribution summaries (e.g., “distance-to-curb decreasing + heading toward crosswalk”).
Robustness: handle occlusion and sensor noise; avoid brittle dependence on a single feature like signal state.
Evaluation: must avoid temporal leakage; split by geography + time (hold out intersections and weeks).

Deliverables

Define the prediction target(s): classification horizon(s) (1s/2s/3s) and optional time-to-cross regression.
Propose a model architecture and training objective that handles imbalance and produces calibrated probabilities.
Specify feature engineering for sequences and context, including handling missing/occlusion.
Describe the train/val/test split and cross-validation strategy to prevent leakage.
Provide an evaluation plan with metrics aligned to safety and planning (PR-AUC, recall@precision, calibration, time-to-detection).
Outline a production deployment plan: streaming inference, monitoring, retraining cadence, and rollback triggers.

Business Context

Dataset

You have logged data from the perception stack (already fused and tracked). Each example corresponds to a pedestrian track segment aligned to the ego vehicle timeline.

Component	Scale / Shape	Examples / Notes
Track sequences	1.8M sequences	20 Hz, variable length 1–6s (padded/truncated)
Kinematics (per timestep)	12 floats	x/y in ego frame, vx/vy, ax/ay, heading, yaw_rate
Scene context (static)	18 floats	distance to nearest crosswalk, curb, sidewalk; lane count; speed limit
Interaction features	10 floats	TTC to ego path, relative bearing, ego speed/accel, gap to nearest vehicle
Semantics	6 categorical	intersection type, crosswalk present, signal state (if available), time-of-day bucket
Labels	binary + time-to-event	whether pedestrian enters ego lane/crosswalk within horizon; time-to-cross (if positive)

Additional characteristics:

Positive rate: ~7% of sequences result in a crossing within 3s (highly imbalanced).
Missingness: ~12% of timesteps have partial occlusion (noisy velocity/accel); signal state missing in ~30% of scenes.
Leakage risk: tracks may include frames after the pedestrian has already stepped off the curb if you’re not careful with label alignment.

Success Criteria

Safety-first recall: At precision ≥ 0.35, achieve recall ≥ 0.85 for “cross within 3s”.
Calibrated probabilities: Expected Calibration Error (ECE) ≤ 0.03 so the planner can use probabilities as costs.
Timeliness: Median time-to-detection (first time model crosses alert threshold before crossing) ≥ 1.0s.
On-vehicle performance: p95 inference latency ≤ 20 ms per tracked pedestrian on target hardware (e.g., NVIDIA Orin), with up to 40 concurrent tracks.

Constraints

Real-time: streaming inference at 20 Hz; must support batching across tracks.
Interpretability: safety review requires feature/attribution summaries (e.g., “distance-to-curb decreasing + heading toward crosswalk”).
Robustness: handle occlusion and sensor noise; avoid brittle dependence on a single feature like signal state.
Evaluation: must avoid temporal leakage; split by geography + time (hold out intersections and weeks).

Deliverables

Define the prediction target(s): classification horizon(s) (1s/2s/3s) and optional time-to-cross regression.
Propose a model architecture and training objective that handles imbalance and produces calibrated probabilities.
Specify feature engineering for sequences and context, including handling missing/occlusion.
Describe the train/val/test split and cross-validation strategy to prevent leakage.
Provide an evaluation plan with metrics aligned to safety and planning (PR-AUC, recall@precision, calibration, time-to-detection).
Outline a production deployment plan: streaming inference, monitoring, retraining cadence, and rollback triggers.

Interview Guides

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Predict Pedestrian Crossing Intent On-Vehicle

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer

Predict Pedestrian Crossing Intent On-Vehicle

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Predict Pedestrian Crossing Intent On-Vehicle

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer