Warehouse Object Detection for Robotics

Business Context

You’re interviewing for an ML role on the FulfillBot team at a large e-commerce logistics company operating 120 fulfillment centers. The company is rolling out autonomous mobile robots that navigate aisles and pick totes. A key safety and productivity requirement is real-time object detection from an RGB camera mounted on the robot: detect people, forklifts, pallets, totes, barcode labels, and spill hazards.

The system runs on an edge GPU (Jetson-class) and must avoid collisions (safety) while keeping throughput high (revenue). A missed detection of a person is a serious safety incident; false positives reduce speed and cause traffic jams.

Dataset

You are given a labeled dataset collected from 3 months of operations across 30 warehouses.

Feature Group	Details
Images	1.8M RGB frames, 1280×720, 10 FPS sampled to 1 FPS for labeling; motion blur and low light common
Labels	Bounding boxes + class (6 classes). Some frames contain multiple objects; occlusion frequent
Environments	Narrow aisles, reflective floors, seasonal lighting changes, camera vibration
Splits metadata	`warehouse_id`, `camera_id`, `timestamp`, `zone_type` (aisle/dock), `shift` (day/night)

Class distribution (by boxes): person 41%, tote 33%, pallet 15%, forklift 9%, barcode label 1.5%, spill hazard 0.5% (long tail)
Missing/quality issues: ~7% frames have partial labeling (only safety-critical classes labeled), ~3% have noisy boxes (tightness varies), and ~12% are near-duplicates.

Success Criteria

Safety-critical: person class recall ≥ 0.97 at precision ≥ 0.90 on the held-out test set.
Overall detection: mAP@[0.5:0.95] ≥ 0.42 overall, and mAP@0.5 ≥ 0.75 for person/forklift.
Latency: p95 end-to-end inference ≤ 60 ms per frame on the target edge GPU (including preprocessing + NMS).
Robustness: performance drop between day vs night shifts ≤ 10% relative on mAP.

Constraints

Deployment: model must run on-device; no cloud calls. Target runtime is TensorRT.
Compute budget: training on 8×A100 for up to 48 hours; iteration speed matters.
Data leakage risk: frames from the same camera_id are highly correlated; random splits will inflate metrics.
Label quality: partial labeling means naive negative sampling can create false negatives.

Deliverables (what you must walk through)

Propose an end-to-end approach (model family, training recipe, augmentation, loss choices) for object detection.
Define a data splitting strategy that avoids leakage and reflects deployment.
Explain how you will handle class imbalance, partial labels, and domain shifts (day/night, warehouses).
Specify evaluation metrics and how you will choose operating thresholds for safety-critical classes.
Outline a production plan: export, quantization, monitoring, and retraining triggers.

Business Context

Dataset

You are given a labeled dataset collected from 3 months of operations across 30 warehouses.

Feature Group	Details
Images	1.8M RGB frames, 1280×720, 10 FPS sampled to 1 FPS for labeling; motion blur and low light common
Labels	Bounding boxes + class (6 classes). Some frames contain multiple objects; occlusion frequent
Environments	Narrow aisles, reflective floors, seasonal lighting changes, camera vibration
Splits metadata	`warehouse_id`, `camera_id`, `timestamp`, `zone_type` (aisle/dock), `shift` (day/night)

Class distribution (by boxes): person 41%, tote 33%, pallet 15%, forklift 9%, barcode label 1.5%, spill hazard 0.5% (long tail)
Missing/quality issues: ~7% frames have partial labeling (only safety-critical classes labeled), ~3% have noisy boxes (tightness varies), and ~12% are near-duplicates.

Success Criteria

Safety-critical: person class recall ≥ 0.97 at precision ≥ 0.90 on the held-out test set.
Overall detection: mAP@[0.5:0.95] ≥ 0.42 overall, and mAP@0.5 ≥ 0.75 for person/forklift.
Latency: p95 end-to-end inference ≤ 60 ms per frame on the target edge GPU (including preprocessing + NMS).
Robustness: performance drop between day vs night shifts ≤ 10% relative on mAP.

Constraints

Deployment: model must run on-device; no cloud calls. Target runtime is TensorRT.
Compute budget: training on 8×A100 for up to 48 hours; iteration speed matters.
Data leakage risk: frames from the same camera_id are highly correlated; random splits will inflate metrics.
Label quality: partial labeling means naive negative sampling can create false negatives.

Deliverables (what you must walk through)

Propose an end-to-end approach (model family, training recipe, augmentation, loss choices) for object detection.
Define a data splitting strategy that avoids leakage and reflects deployment.
Explain how you will handle class imbalance, partial labels, and domain shifts (day/night, warehouses).
Specify evaluation metrics and how you will choose operating thresholds for safety-critical classes.
Outline a production plan: export, quantization, monitoring, and retraining triggers.

Business Context

Dataset

You are given a labeled dataset collected from 3 months of operations across 30 warehouses.

Feature Group	Details
Images	1.8M RGB frames, 1280×720, 10 FPS sampled to 1 FPS for labeling; motion blur and low light common
Labels	Bounding boxes + class (6 classes). Some frames contain multiple objects; occlusion frequent
Environments	Narrow aisles, reflective floors, seasonal lighting changes, camera vibration
Splits metadata	`warehouse_id`, `camera_id`, `timestamp`, `zone_type` (aisle/dock), `shift` (day/night)

Class distribution (by boxes): person 41%, tote 33%, pallet 15%, forklift 9%, barcode label 1.5%, spill hazard 0.5% (long tail)
Missing/quality issues: ~7% frames have partial labeling (only safety-critical classes labeled), ~3% have noisy boxes (tightness varies), and ~12% are near-duplicates.

Success Criteria

Safety-critical: person class recall ≥ 0.97 at precision ≥ 0.90 on the held-out test set.
Overall detection: mAP@[0.5:0.95] ≥ 0.42 overall, and mAP@0.5 ≥ 0.75 for person/forklift.
Latency: p95 end-to-end inference ≤ 60 ms per frame on the target edge GPU (including preprocessing + NMS).
Robustness: performance drop between day vs night shifts ≤ 10% relative on mAP.

Constraints

Deployment: model must run on-device; no cloud calls. Target runtime is TensorRT.
Compute budget: training on 8×A100 for up to 48 hours; iteration speed matters.
Data leakage risk: frames from the same camera_id are highly correlated; random splits will inflate metrics.
Label quality: partial labeling means naive negative sampling can create false negatives.

Deliverables (what you must walk through)

Propose an end-to-end approach (model family, training recipe, augmentation, loss choices) for object detection.
Define a data splitting strategy that avoids leakage and reflects deployment.
Explain how you will handle class imbalance, partial labels, and domain shifts (day/night, warehouses).
Specify evaluation metrics and how you will choose operating thresholds for safety-critical classes.
Outline a production plan: export, quantization, monitoring, and retraining triggers.

Business Context

Dataset

You are given a labeled dataset collected from 3 months of operations across 30 warehouses.

Feature Group	Details
Images	1.8M RGB frames, 1280×720, 10 FPS sampled to 1 FPS for labeling; motion blur and low light common
Labels	Bounding boxes + class (6 classes). Some frames contain multiple objects; occlusion frequent
Environments	Narrow aisles, reflective floors, seasonal lighting changes, camera vibration
Splits metadata	`warehouse_id`, `camera_id`, `timestamp`, `zone_type` (aisle/dock), `shift` (day/night)

Class distribution (by boxes): person 41%, tote 33%, pallet 15%, forklift 9%, barcode label 1.5%, spill hazard 0.5% (long tail)
Missing/quality issues: ~7% frames have partial labeling (only safety-critical classes labeled), ~3% have noisy boxes (tightness varies), and ~12% are near-duplicates.

Success Criteria

Safety-critical: person class recall ≥ 0.97 at precision ≥ 0.90 on the held-out test set.
Overall detection: mAP@[0.5:0.95] ≥ 0.42 overall, and mAP@0.5 ≥ 0.75 for person/forklift.
Latency: p95 end-to-end inference ≤ 60 ms per frame on the target edge GPU (including preprocessing + NMS).
Robustness: performance drop between day vs night shifts ≤ 10% relative on mAP.

Constraints

Deployment: model must run on-device; no cloud calls. Target runtime is TensorRT.
Compute budget: training on 8×A100 for up to 48 hours; iteration speed matters.
Data leakage risk: frames from the same camera_id are highly correlated; random splits will inflate metrics.
Label quality: partial labeling means naive negative sampling can create false negatives.

Deliverables (what you must walk through)

Propose an end-to-end approach (model family, training recipe, augmentation, loss choices) for object detection.
Define a data splitting strategy that avoids leakage and reflects deployment.
Explain how you will handle class imbalance, partial labels, and domain shifts (day/night, warehouses).
Specify evaluation metrics and how you will choose operating thresholds for safety-critical classes.
Outline a production plan: export, quantization, monitoring, and retraining triggers.

Interview Guides

Business Context

Dataset

Success Criteria

Constraints

Deliverables (what you must walk through)

Warehouse Object Detection for Robotics

Business Context

Dataset

Success Criteria

Constraints

Deliverables (what you must walk through)

Your Answer

Warehouse Object Detection for Robotics

Business Context

Dataset

Success Criteria

Constraints

Deliverables (what you must walk through)

Warehouse Object Detection for Robotics

Business Context

Dataset

Success Criteria

Constraints

Deliverables (what you must walk through)

Your Answer