Long-Tail Emergency Vehicle Detection

Business Context

You’re on the perception team at MetroDrive, a ride-hailing + autonomous delivery company operating in 6 major US cities. Your vehicles run an on-device camera model that classifies nearby vehicles into categories used by downstream planning: {car, truck, bus, motorcycle, bicycle, emergency_vehicle}. Missing an emergency vehicle (ambulance, fire truck, police) is a safety-critical failure: the planner may not yield or may choose an unsafe maneuver. However, emergency vehicles are rare in the training data (a classic long-tail problem), and the model “rarely sees” them during training.

The current model performs well on common classes but has poor recall on emergency_vehicle, especially at night, in rain, and when lights are partially occluded.

Dataset

You have a curated dataset from the last 90 days of fleet driving.

Component	Scale / Details
Raw frames	220M frames (30 FPS video), sampled to 1 FPS for labeling candidates
Labeled images	3.2M images with a primary vehicle label (single-label classification)
Classes	car, truck, bus, motorcycle, bicycle, emergency_vehicle
Class distribution	car 71%, truck 18%, bus 4.5%, motorcycle 3.2%, bicycle 3.0%, emergency_vehicle 0.3%
Features available	image pixels; metadata: city, time_of_day, weather, camera_id, speed, road_type
Label noise	~1–2% overall; higher for emergency vehicles due to ambiguity (e.g., tow trucks with lights)
Deployment	On-device (edge GPU), max 15 ms per frame end-to-end for this classifier

Success Criteria

Your goal is to improve performance on the long-tail class without breaking overall system behavior.

Emergency vehicle recall ≥ 0.85 at precision ≥ 0.60 on a held-out test set representing the most recent 2 weeks.
Overall macro F1 ≥ 0.78 (to avoid collapsing other classes).
Night + rain slice: emergency recall ≥ 0.75.
No more than +10% increase in average inference latency vs the current baseline model.

Constraints

You cannot collect new labeled data immediately; you can only use the existing dataset plus unlabeled fleet video.
The model must be robust across cities and camera hardware revisions.
You must propose an approach that can be implemented and iterated in production (monitoring + retraining plan).

Deliverables

Propose a modeling and training strategy to address the long-tail emergency vehicle class (data, loss, sampling, augmentation, architecture choices).
Define a validation strategy that avoids leakage and gives reliable estimates for rare-class performance.
Describe how you would use unlabeled data (if at all) to improve emergency-vehicle performance.
Provide an evaluation plan (metrics + slice metrics + thresholding) and how you’d pick an operating point.
Outline a production plan: monitoring, drift detection, and how you’d trigger retraining for the tail class.

Business Context

The current model performs well on common classes but has poor recall on emergency_vehicle, especially at night, in rain, and when lights are partially occluded.

Dataset

You have a curated dataset from the last 90 days of fleet driving.

Component	Scale / Details
Raw frames	220M frames (30 FPS video), sampled to 1 FPS for labeling candidates
Labeled images	3.2M images with a primary vehicle label (single-label classification)
Classes	car, truck, bus, motorcycle, bicycle, emergency_vehicle
Class distribution	car 71%, truck 18%, bus 4.5%, motorcycle 3.2%, bicycle 3.0%, emergency_vehicle 0.3%
Features available	image pixels; metadata: city, time_of_day, weather, camera_id, speed, road_type
Label noise	~1–2% overall; higher for emergency vehicles due to ambiguity (e.g., tow trucks with lights)
Deployment	On-device (edge GPU), max 15 ms per frame end-to-end for this classifier

Success Criteria

Your goal is to improve performance on the long-tail class without breaking overall system behavior.

Emergency vehicle recall ≥ 0.85 at precision ≥ 0.60 on a held-out test set representing the most recent 2 weeks.
Overall macro F1 ≥ 0.78 (to avoid collapsing other classes).
Night + rain slice: emergency recall ≥ 0.75.
No more than +10% increase in average inference latency vs the current baseline model.

Constraints

You cannot collect new labeled data immediately; you can only use the existing dataset plus unlabeled fleet video.
The model must be robust across cities and camera hardware revisions.
You must propose an approach that can be implemented and iterated in production (monitoring + retraining plan).

Deliverables

Propose a modeling and training strategy to address the long-tail emergency vehicle class (data, loss, sampling, augmentation, architecture choices).
Define a validation strategy that avoids leakage and gives reliable estimates for rare-class performance.
Describe how you would use unlabeled data (if at all) to improve emergency-vehicle performance.
Provide an evaluation plan (metrics + slice metrics + thresholding) and how you’d pick an operating point.
Outline a production plan: monitoring, drift detection, and how you’d trigger retraining for the tail class.

Business Context

The current model performs well on common classes but has poor recall on emergency_vehicle, especially at night, in rain, and when lights are partially occluded.

Dataset

You have a curated dataset from the last 90 days of fleet driving.

Component	Scale / Details
Raw frames	220M frames (30 FPS video), sampled to 1 FPS for labeling candidates
Labeled images	3.2M images with a primary vehicle label (single-label classification)
Classes	car, truck, bus, motorcycle, bicycle, emergency_vehicle
Class distribution	car 71%, truck 18%, bus 4.5%, motorcycle 3.2%, bicycle 3.0%, emergency_vehicle 0.3%
Features available	image pixels; metadata: city, time_of_day, weather, camera_id, speed, road_type
Label noise	~1–2% overall; higher for emergency vehicles due to ambiguity (e.g., tow trucks with lights)
Deployment	On-device (edge GPU), max 15 ms per frame end-to-end for this classifier

Success Criteria

Your goal is to improve performance on the long-tail class without breaking overall system behavior.

Emergency vehicle recall ≥ 0.85 at precision ≥ 0.60 on a held-out test set representing the most recent 2 weeks.
Overall macro F1 ≥ 0.78 (to avoid collapsing other classes).
Night + rain slice: emergency recall ≥ 0.75.
No more than +10% increase in average inference latency vs the current baseline model.

Constraints

You cannot collect new labeled data immediately; you can only use the existing dataset plus unlabeled fleet video.
The model must be robust across cities and camera hardware revisions.
You must propose an approach that can be implemented and iterated in production (monitoring + retraining plan).

Deliverables

Propose a modeling and training strategy to address the long-tail emergency vehicle class (data, loss, sampling, augmentation, architecture choices).
Define a validation strategy that avoids leakage and gives reliable estimates for rare-class performance.
Describe how you would use unlabeled data (if at all) to improve emergency-vehicle performance.
Provide an evaluation plan (metrics + slice metrics + thresholding) and how you’d pick an operating point.
Outline a production plan: monitoring, drift detection, and how you’d trigger retraining for the tail class.

Business Context

The current model performs well on common classes but has poor recall on emergency_vehicle, especially at night, in rain, and when lights are partially occluded.

Dataset

You have a curated dataset from the last 90 days of fleet driving.

Component	Scale / Details
Raw frames	220M frames (30 FPS video), sampled to 1 FPS for labeling candidates
Labeled images	3.2M images with a primary vehicle label (single-label classification)
Classes	car, truck, bus, motorcycle, bicycle, emergency_vehicle
Class distribution	car 71%, truck 18%, bus 4.5%, motorcycle 3.2%, bicycle 3.0%, emergency_vehicle 0.3%
Features available	image pixels; metadata: city, time_of_day, weather, camera_id, speed, road_type
Label noise	~1–2% overall; higher for emergency vehicles due to ambiguity (e.g., tow trucks with lights)
Deployment	On-device (edge GPU), max 15 ms per frame end-to-end for this classifier

Success Criteria

Your goal is to improve performance on the long-tail class without breaking overall system behavior.

Emergency vehicle recall ≥ 0.85 at precision ≥ 0.60 on a held-out test set representing the most recent 2 weeks.
Overall macro F1 ≥ 0.78 (to avoid collapsing other classes).
Night + rain slice: emergency recall ≥ 0.75.
No more than +10% increase in average inference latency vs the current baseline model.

Constraints

You cannot collect new labeled data immediately; you can only use the existing dataset plus unlabeled fleet video.
The model must be robust across cities and camera hardware revisions.
You must propose an approach that can be implemented and iterated in production (monitoring + retraining plan).

Deliverables

Propose a modeling and training strategy to address the long-tail emergency vehicle class (data, loss, sampling, augmentation, architecture choices).
Define a validation strategy that avoids leakage and gives reliable estimates for rare-class performance.
Describe how you would use unlabeled data (if at all) to improve emergency-vehicle performance.
Provide an evaluation plan (metrics + slice metrics + thresholding) and how you’d pick an operating point.
Outline a production plan: monitoring, drift detection, and how you’d trigger retraining for the tail class.

Interview Guides

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Long-Tail Emergency Vehicle Detection

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer

Long-Tail Emergency Vehicle Detection

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Long-Tail Emergency Vehicle Detection

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer