Reduce Driving Errors with DAgger

Business Context

RoboCourier is training an imitation-learning policy for low-speed warehouse cart navigation across 12 fulfillment centers. A behavior cloning model trained only on expert demonstrations performs well offline but degrades sharply during live rollouts because small mistakes push the cart into states the expert dataset rarely contains.

Dataset

You are given logged state-action trajectories from a human teleoperator and additional states collected from policy rollouts that can be re-labeled by the expert. The goal is to implement and evaluate a DAgger-style training loop that reduces compounding errors versus standard behavior cloning.

Feature Group	Count	Examples
Sensor state	96	lidar bins, depth summaries, obstacle distances
Kinematics	8	speed, yaw rate, steering angle, acceleration
Route context	12	waypoint heading error, distance-to-goal, aisle width
Action labels	2	steering command, throttle command
Metadata	5	site_id, robot_id, timestamp, episode_id, intervention_flag

Size: 420K state-action pairs from expert trajectories, plus up to 180K policy-visited states available for iterative expert relabeling
Target: Predict expert action at each state; either discretize into 9 action classes or model two continuous controls
Class balance: Highly skewed toward “go straight / low steering correction”; recovery maneuvers are rare (<6%)
Missing data: 3% intermittent sensor dropouts in lidar/depth features; some route-context features missing for newly mapped aisles

Success Criteria

A strong solution should show that DAgger improves closed-loop performance over behavior cloning by reducing intervention rate and cumulative trajectory deviation. Good enough means at least a 25% reduction in interventions per 1000 meters while keeping inference under 20 ms per step.

Constraints

Expert relabeling is expensive, so each DAgger round can add at most 30K new states
The operations team wants a model simple enough to debug and retrain weekly
Evaluation must distinguish offline imitation loss from online rollout quality

Deliverables

Train a behavior cloning baseline and a DAgger-enhanced policy
Define the aggregation strategy for newly visited states and expert labels
Propose features, preprocessing, and a leakage-safe train/validation/test split by site or episode
Evaluate both offline and closed-loop metrics, and explain why DAgger addresses compounding error
Provide production-ready training code and thresholding/monitoring recommendations for deployment

Business Context

Dataset

Feature Group	Count	Examples
Sensor state	96	lidar bins, depth summaries, obstacle distances
Kinematics	8	speed, yaw rate, steering angle, acceleration
Route context	12	waypoint heading error, distance-to-goal, aisle width
Action labels	2	steering command, throttle command
Metadata	5	site_id, robot_id, timestamp, episode_id, intervention_flag

Size: 420K state-action pairs from expert trajectories, plus up to 180K policy-visited states available for iterative expert relabeling
Target: Predict expert action at each state; either discretize into 9 action classes or model two continuous controls
Class balance: Highly skewed toward “go straight / low steering correction”; recovery maneuvers are rare (<6%)
Missing data: 3% intermittent sensor dropouts in lidar/depth features; some route-context features missing for newly mapped aisles

Success Criteria

Constraints

Expert relabeling is expensive, so each DAgger round can add at most 30K new states
The operations team wants a model simple enough to debug and retrain weekly
Evaluation must distinguish offline imitation loss from online rollout quality

Deliverables

Train a behavior cloning baseline and a DAgger-enhanced policy
Define the aggregation strategy for newly visited states and expert labels
Propose features, preprocessing, and a leakage-safe train/validation/test split by site or episode
Evaluate both offline and closed-loop metrics, and explain why DAgger addresses compounding error
Provide production-ready training code and thresholding/monitoring recommendations for deployment

Business Context

Dataset

Feature Group	Count	Examples
Sensor state	96	lidar bins, depth summaries, obstacle distances
Kinematics	8	speed, yaw rate, steering angle, acceleration
Route context	12	waypoint heading error, distance-to-goal, aisle width
Action labels	2	steering command, throttle command
Metadata	5	site_id, robot_id, timestamp, episode_id, intervention_flag

Size: 420K state-action pairs from expert trajectories, plus up to 180K policy-visited states available for iterative expert relabeling
Target: Predict expert action at each state; either discretize into 9 action classes or model two continuous controls
Class balance: Highly skewed toward “go straight / low steering correction”; recovery maneuvers are rare (<6%)
Missing data: 3% intermittent sensor dropouts in lidar/depth features; some route-context features missing for newly mapped aisles

Success Criteria

Constraints

Expert relabeling is expensive, so each DAgger round can add at most 30K new states
The operations team wants a model simple enough to debug and retrain weekly
Evaluation must distinguish offline imitation loss from online rollout quality

Deliverables

Train a behavior cloning baseline and a DAgger-enhanced policy
Define the aggregation strategy for newly visited states and expert labels
Propose features, preprocessing, and a leakage-safe train/validation/test split by site or episode
Evaluate both offline and closed-loop metrics, and explain why DAgger addresses compounding error
Provide production-ready training code and thresholding/monitoring recommendations for deployment

Business Context

Dataset

Feature Group	Count	Examples
Sensor state	96	lidar bins, depth summaries, obstacle distances
Kinematics	8	speed, yaw rate, steering angle, acceleration
Route context	12	waypoint heading error, distance-to-goal, aisle width
Action labels	2	steering command, throttle command
Metadata	5	site_id, robot_id, timestamp, episode_id, intervention_flag

Size: 420K state-action pairs from expert trajectories, plus up to 180K policy-visited states available for iterative expert relabeling
Target: Predict expert action at each state; either discretize into 9 action classes or model two continuous controls
Class balance: Highly skewed toward “go straight / low steering correction”; recovery maneuvers are rare (<6%)
Missing data: 3% intermittent sensor dropouts in lidar/depth features; some route-context features missing for newly mapped aisles

Success Criteria

Constraints

Expert relabeling is expensive, so each DAgger round can add at most 30K new states
The operations team wants a model simple enough to debug and retrain weekly
Evaluation must distinguish offline imitation loss from online rollout quality

Deliverables

Train a behavior cloning baseline and a DAgger-enhanced policy
Define the aggregation strategy for newly visited states and expert labels
Propose features, preprocessing, and a leakage-safe train/validation/test split by site or episode
Evaluate both offline and closed-loop metrics, and explain why DAgger addresses compounding error
Provide production-ready training code and thresholding/monitoring recommendations for deployment

Interview Guides

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Reduce Driving Errors with DAgger

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer

Reduce Driving Errors with DAgger

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Reduce Driving Errors with DAgger

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer