Real-Time Traffic Light State Detection

Business Context

You’re on the autonomy perception team at AuroraRide, a robotaxi operator running 3,500 vehicles across Phoenix and Austin. Each vehicle has a forward-facing 8MP RGB camera (30 FPS) and must obey traffic signals with extremely high reliability. A recent safety review found that the stack occasionally confuses red vs yellow in backlit scenes and misses small, distant lights at complex intersections. A single failure can cause a safety incident and immediate regulatory scrutiny.

Your task is to design an ML system that (1) detects traffic lights in camera images and (2) classifies their state (e.g., red/yellow/green/off/unknown). The system will run on-vehicle and must support real-time inference.

Dataset

You have access to a labeled dataset collected from the fleet.

Component	Details
Images	1.8M frames (day/night/rain), 1920×1080 RGB, 30 FPS sequences but labeled per-frame
Labels	Bounding boxes for each visible traffic light + state label per box
Classes	`red`, `yellow`, `green`, `off` (unlit), `unknown` (occluded/ambiguous)
Object size	40% of boxes are < 20×20 px (distant lights); long-tail of tiny objects
Class balance	`green` 52%, `red` 33%, `yellow` 7%, `off` 5%, `unknown` 3%
Domain shift	20% frames at night; 8% rain; 5% lens flare; new intersections added weekly
Missing/Noisy labels	~2% boxes have incorrect state due to annotation ambiguity; some frames missing boxes for far lights

You also have a held-out “regression suite”: 12,000 frames from the last 2 weeks at newly mapped intersections (unseen during training).

Success Criteria

Safety-critical state accuracy: On detected lights, achieve ≥ 99.5% accuracy on red vs green (binary), and ≥ 92% macro-F1 across all 5 states.
Detection quality: mAP@[0.5:0.95] ≥ 0.45 overall and mAP_small ≥ 0.20 for small objects.
Operational metric: At a threshold tuned for safety, achieve miss rate (FN) for red lights ≤ 0.3% on the regression suite.

Constraints

Latency: ≤ 25 ms per frame on an NVIDIA Orin-class edge GPU (batch size 1).
Compute/Memory: ≤ 2.5 GB GPU memory for the model + runtime.
Robustness: Must handle backlighting, night glare, partial occlusions, and unusual signal housings.
Explainability: Need actionable debugging artifacts (per-slice metrics, failure mode clustering), not necessarily full interpretability.
Deployment: Weekly model refresh is possible, but labeling budget is limited to ~30K new boxes/week.

Deliverables (what you must produce in the interview)

Modeling approach: Choose an architecture to jointly detect + classify (or a two-stage pipeline). Justify tradeoffs vs alternatives.
Data strategy: How you will address class imbalance, tiny objects, label noise, and domain shift (night/rain/new intersections).
Training & validation plan: Splits to avoid leakage (sequence/location), augmentation strategy, and how you’ll tune thresholds for safety.
Evaluation: Metrics you’ll report (including slice metrics) and how you’ll set a decision policy for “unknown”.
Production plan: Inference optimization (quantization/TensorRT), monitoring, and a retraining loop with limited labeling.

You may assume you can use PyTorch and common detection toolchains, but you must be explicit about the exact metrics, thresholds, and release gates you would use before shipping to the fleet.

Business Context

Dataset

You have access to a labeled dataset collected from the fleet.

Component	Details
Images	1.8M frames (day/night/rain), 1920×1080 RGB, 30 FPS sequences but labeled per-frame
Labels	Bounding boxes for each visible traffic light + state label per box
Classes	`red`, `yellow`, `green`, `off` (unlit), `unknown` (occluded/ambiguous)
Object size	40% of boxes are < 20×20 px (distant lights); long-tail of tiny objects
Class balance	`green` 52%, `red` 33%, `yellow` 7%, `off` 5%, `unknown` 3%
Domain shift	20% frames at night; 8% rain; 5% lens flare; new intersections added weekly
Missing/Noisy labels	~2% boxes have incorrect state due to annotation ambiguity; some frames missing boxes for far lights

You also have a held-out “regression suite”: 12,000 frames from the last 2 weeks at newly mapped intersections (unseen during training).

Success Criteria

Safety-critical state accuracy: On detected lights, achieve ≥ 99.5% accuracy on red vs green (binary), and ≥ 92% macro-F1 across all 5 states.
Detection quality: mAP@[0.5:0.95] ≥ 0.45 overall and mAP_small ≥ 0.20 for small objects.
Operational metric: At a threshold tuned for safety, achieve miss rate (FN) for red lights ≤ 0.3% on the regression suite.

Constraints

Latency: ≤ 25 ms per frame on an NVIDIA Orin-class edge GPU (batch size 1).
Compute/Memory: ≤ 2.5 GB GPU memory for the model + runtime.
Robustness: Must handle backlighting, night glare, partial occlusions, and unusual signal housings.
Explainability: Need actionable debugging artifacts (per-slice metrics, failure mode clustering), not necessarily full interpretability.
Deployment: Weekly model refresh is possible, but labeling budget is limited to ~30K new boxes/week.

Deliverables (what you must produce in the interview)

Modeling approach: Choose an architecture to jointly detect + classify (or a two-stage pipeline). Justify tradeoffs vs alternatives.
Data strategy: How you will address class imbalance, tiny objects, label noise, and domain shift (night/rain/new intersections).
Training & validation plan: Splits to avoid leakage (sequence/location), augmentation strategy, and how you’ll tune thresholds for safety.
Evaluation: Metrics you’ll report (including slice metrics) and how you’ll set a decision policy for “unknown”.
Production plan: Inference optimization (quantization/TensorRT), monitoring, and a retraining loop with limited labeling.

You may assume you can use PyTorch and common detection toolchains, but you must be explicit about the exact metrics, thresholds, and release gates you would use before shipping to the fleet.

Business Context

Dataset

You have access to a labeled dataset collected from the fleet.

Component	Details
Images	1.8M frames (day/night/rain), 1920×1080 RGB, 30 FPS sequences but labeled per-frame
Labels	Bounding boxes for each visible traffic light + state label per box
Classes	`red`, `yellow`, `green`, `off` (unlit), `unknown` (occluded/ambiguous)
Object size	40% of boxes are < 20×20 px (distant lights); long-tail of tiny objects
Class balance	`green` 52%, `red` 33%, `yellow` 7%, `off` 5%, `unknown` 3%
Domain shift	20% frames at night; 8% rain; 5% lens flare; new intersections added weekly
Missing/Noisy labels	~2% boxes have incorrect state due to annotation ambiguity; some frames missing boxes for far lights

You also have a held-out “regression suite”: 12,000 frames from the last 2 weeks at newly mapped intersections (unseen during training).

Success Criteria

Safety-critical state accuracy: On detected lights, achieve ≥ 99.5% accuracy on red vs green (binary), and ≥ 92% macro-F1 across all 5 states.
Detection quality: mAP@[0.5:0.95] ≥ 0.45 overall and mAP_small ≥ 0.20 for small objects.
Operational metric: At a threshold tuned for safety, achieve miss rate (FN) for red lights ≤ 0.3% on the regression suite.

Constraints

Latency: ≤ 25 ms per frame on an NVIDIA Orin-class edge GPU (batch size 1).
Compute/Memory: ≤ 2.5 GB GPU memory for the model + runtime.
Robustness: Must handle backlighting, night glare, partial occlusions, and unusual signal housings.
Explainability: Need actionable debugging artifacts (per-slice metrics, failure mode clustering), not necessarily full interpretability.
Deployment: Weekly model refresh is possible, but labeling budget is limited to ~30K new boxes/week.

Deliverables (what you must produce in the interview)

Modeling approach: Choose an architecture to jointly detect + classify (or a two-stage pipeline). Justify tradeoffs vs alternatives.
Data strategy: How you will address class imbalance, tiny objects, label noise, and domain shift (night/rain/new intersections).
Training & validation plan: Splits to avoid leakage (sequence/location), augmentation strategy, and how you’ll tune thresholds for safety.
Evaluation: Metrics you’ll report (including slice metrics) and how you’ll set a decision policy for “unknown”.
Production plan: Inference optimization (quantization/TensorRT), monitoring, and a retraining loop with limited labeling.

You may assume you can use PyTorch and common detection toolchains, but you must be explicit about the exact metrics, thresholds, and release gates you would use before shipping to the fleet.

Business Context

Dataset

You have access to a labeled dataset collected from the fleet.

Component	Details
Images	1.8M frames (day/night/rain), 1920×1080 RGB, 30 FPS sequences but labeled per-frame
Labels	Bounding boxes for each visible traffic light + state label per box
Classes	`red`, `yellow`, `green`, `off` (unlit), `unknown` (occluded/ambiguous)
Object size	40% of boxes are < 20×20 px (distant lights); long-tail of tiny objects
Class balance	`green` 52%, `red` 33%, `yellow` 7%, `off` 5%, `unknown` 3%
Domain shift	20% frames at night; 8% rain; 5% lens flare; new intersections added weekly
Missing/Noisy labels	~2% boxes have incorrect state due to annotation ambiguity; some frames missing boxes for far lights

You also have a held-out “regression suite”: 12,000 frames from the last 2 weeks at newly mapped intersections (unseen during training).

Success Criteria

Safety-critical state accuracy: On detected lights, achieve ≥ 99.5% accuracy on red vs green (binary), and ≥ 92% macro-F1 across all 5 states.
Detection quality: mAP@[0.5:0.95] ≥ 0.45 overall and mAP_small ≥ 0.20 for small objects.
Operational metric: At a threshold tuned for safety, achieve miss rate (FN) for red lights ≤ 0.3% on the regression suite.

Constraints

Latency: ≤ 25 ms per frame on an NVIDIA Orin-class edge GPU (batch size 1).
Compute/Memory: ≤ 2.5 GB GPU memory for the model + runtime.
Robustness: Must handle backlighting, night glare, partial occlusions, and unusual signal housings.
Explainability: Need actionable debugging artifacts (per-slice metrics, failure mode clustering), not necessarily full interpretability.
Deployment: Weekly model refresh is possible, but labeling budget is limited to ~30K new boxes/week.

Deliverables (what you must produce in the interview)

Modeling approach: Choose an architecture to jointly detect + classify (or a two-stage pipeline). Justify tradeoffs vs alternatives.
Data strategy: How you will address class imbalance, tiny objects, label noise, and domain shift (night/rain/new intersections).
Training & validation plan: Splits to avoid leakage (sequence/location), augmentation strategy, and how you’ll tune thresholds for safety.
Evaluation: Metrics you’ll report (including slice metrics) and how you’ll set a decision policy for “unknown”.
Production plan: Inference optimization (quantization/TensorRT), monitoring, and a retraining loop with limited labeling.

You may assume you can use PyTorch and common detection toolchains, but you must be explicit about the exact metrics, thresholds, and release gates you would use before shipping to the fleet.

Interview Guides

Business Context

Dataset

Success Criteria

Constraints

Deliverables (what you must produce in the interview)

Real-Time Traffic Light State Detection

Business Context

Dataset

Success Criteria

Constraints

Deliverables (what you must produce in the interview)

Your Answer

Real-Time Traffic Light State Detection

Business Context

Dataset

Success Criteria

Constraints

Deliverables (what you must produce in the interview)

Real-Time Traffic Light State Detection

Business Context

Dataset

Success Criteria

Constraints

Deliverables (what you must produce in the interview)

Your Answer