You are building an ML-based routing layer for a large service-oriented application composed of many backend microservices. Each incoming request can be served by multiple valid service paths, but latency, error rate, cost, and downstream load vary over time. The business wants to improve end-user latency and availability while reducing expensive overprovisioning. Your system should choose the best service path per request and adapt quickly to changing service health and traffic patterns.
| Signal | Value |
|---|---|
| DAU | 45M |
| Peak request QPS | 220K |
| Backend microservices | 1,200 |
| Valid candidate paths per request | 50-500 |
| Per-request routing latency budget (p99) | 25ms |
| Service health signal freshness | < 10s |
How would you design this end-to-end ML system so it can retrieve candidate service paths, rank them online, and safely serve routing decisions at scale while handling drift, skew, and failures in the underlying microservices environment?