Business Context
AutoPilot, a leading provider of autonomous drone delivery systems, has recently deployed a new AI agent designed to manage multi-step workflows for package deliveries. However, the system has been experiencing failures in executing these workflows, leading to delayed deliveries and increased customer complaints. The operations team needs a robust strategy to monitor and debug the AI agent to ensure reliability and efficiency in real-world scenarios.
Dataset Description
| Feature Group | Count | Examples |
|---|
| Workflow Events | 10 | event_type, timestamp, step_id |
| Agent States | 5 | battery_level, location, status |
| Environmental Factors | 8 | weather_condition, traffic_level |
| Action Outcomes | 3 | success, failure, retry_count |
- Size: 100K workflow executions, 26 features
- Target: Multi-class outcomes of each workflow execution (success, failure, timeout)
- Class balance: 70% success, 20% failure, 10% timeout
- Missing data: 5% missing in environmental features due to sensor errors
Success Criteria
- Identify root causes of failures in at least 75% of the cases.
- Improve the success rate of workflows by 20% within the next quarter.
- Develop a monitoring dashboard that visualizes key metrics and alerts for anomalies.
Constraints
- The solution must operate in real-time with a maximum latency of 1 second for monitoring.
- Must provide insights that are interpretable for operations staff who are not data scientists.
- Budget constraints limit the use of expensive monitoring tools or extensive engineering resources.
Deliverables
- A detailed monitoring strategy that includes key metrics to track.
- A debugging framework that outlines steps to diagnose workflow failures.
- A Python code snippet demonstrating the implementation of the monitoring and debugging strategy.
- A presentation summarizing findings and recommendations for improvements.