Business Context
You’re interviewing for an ML role on the FulfillBot team at a large e-commerce logistics company operating 120 fulfillment centers. The company is rolling out autonomous mobile robots that navigate aisles and pick totes. A key safety and productivity requirement is real-time object detection from an RGB camera mounted on the robot: detect people, forklifts, pallets, totes, barcode labels, and spill hazards.
The system runs on an edge GPU (Jetson-class) and must avoid collisions (safety) while keeping throughput high (revenue). A missed detection of a person is a serious safety incident; false positives reduce speed and cause traffic jams.
Dataset
You are given a labeled dataset collected from 3 months of operations across 30 warehouses.
| Feature Group | Details |
|---|
| Images | 1.8M RGB frames, 1280×720, 10 FPS sampled to 1 FPS for labeling; motion blur and low light common |
| Labels | Bounding boxes + class (6 classes). Some frames contain multiple objects; occlusion frequent |
| Environments | Narrow aisles, reflective floors, seasonal lighting changes, camera vibration |
| Splits metadata | warehouse_id, camera_id, timestamp, zone_type (aisle/dock), shift (day/night) |
- Class distribution (by boxes): person 41%, tote 33%, pallet 15%, forklift 9%, barcode label 1.5%, spill hazard 0.5% (long tail)
- Missing/quality issues: ~7% frames have partial labeling (only safety-critical classes labeled), ~3% have noisy boxes (tightness varies), and ~12% are near-duplicates.
Success Criteria
- Safety-critical: person class recall ≥ 0.97 at precision ≥ 0.90 on the held-out test set.
- Overall detection: mAP@[0.5:0.95] ≥ 0.42 overall, and mAP@0.5 ≥ 0.75 for person/forklift.
- Latency: p95 end-to-end inference ≤ 60 ms per frame on the target edge GPU (including preprocessing + NMS).
- Robustness: performance drop between day vs night shifts ≤ 10% relative on mAP.
Constraints
- Deployment: model must run on-device; no cloud calls. Target runtime is TensorRT.
- Compute budget: training on 8×A100 for up to 48 hours; iteration speed matters.
- Data leakage risk: frames from the same
camera_id are highly correlated; random splits will inflate metrics.
- Label quality: partial labeling means naive negative sampling can create false negatives.
Deliverables (what you must walk through)
- Propose an end-to-end approach (model family, training recipe, augmentation, loss choices) for object detection.
- Define a data splitting strategy that avoids leakage and reflects deployment.
- Explain how you will handle class imbalance, partial labels, and domain shifts (day/night, warehouses).
- Specify evaluation metrics and how you will choose operating thresholds for safety-critical classes.
- Outline a production plan: export, quantization, monitoring, and retraining triggers.