Product Context
Agero operates roadside assistance and towing dispatch through surfaces used by dispatch agents, service providers, and drivers. Design an end-to-end ML-driven dispatch optimization system that recommends which provider and route should be selected for each incoming roadside event, balancing ETA, completion likelihood, cost, and member experience.
Scale
| Signal | Value |
|---|
| Roadside events/day | 1.2M |
| Peak dispatch decision QPS | 450 |
| Active service providers | 75K |
| Active drivers/vehicles at peak | 110K |
| Candidate providers per event | 50-300 within geo radius |
| Re-dispatch / update rate | 20% of events |
| p99 decision latency budget | 800ms |
Task
- Clarify the product objective and define the optimization target across ETA, acceptance, completion, and cost.
- Design a multi-stage ML system for candidate retrieval, ranking, and re-ranking / constrained optimization.
- Specify the offline and online data pipelines, feature store design, and training cadence.
- Propose model choices for each stage, including how routing signals and provider behavior are incorporated.
- Define evaluation, experimentation, monitoring, and rollback strategy.
- Identify major failure modes, including feature drift, training-serving skew, and operational outages.
Constraints
- The system must support real-time dispatch decisions for high-severity roadside events where latency directly affects customer wait time.
- Some labels are delayed or censored: true completion and final ETA error may arrive 30-120 minutes later.
- Provider availability, traffic, and job queues change minute to minute, so stale features are costly.
- The system must remain explainable enough for Agero dispatch operations to override recommendations when needed.
- Compliance and contractual rules may constrain which providers can service certain geographies, vehicle types, or OEM programs.