You are building a shared rate-limiting service for a distributed platform with many microservices. Static thresholds are causing false blocks during normal bursts and missing abusive traffic patterns, so the team wants an ML-assisted system that can adapt limits by client, endpoint, and behavior.
How would you design a rate-limiting service that can handle millions of requests per minute across multiple microservices?