Product Context
DevFlow provides a public API used by third-party developers for payments, messaging, and analytics. The company wants to replace static per-key quotas with an ML-driven rate-limiting system that predicts abuse risk and allocates request budgets dynamically while protecting legitimate traffic.
Scale
| Signal | Value |
|---|
| API consumers | 8M registered API keys |
| Daily active API keys | 1.2M |
| Peak request rate | 450K QPS |
| Regions | 3 active-active regions |
| Historical request logs | 25B requests / day |
| Feature freshness target | < 60 seconds for behavioral features |
| Decision latency budget | 15ms p99 added latency |
Task
Design an end-to-end ML system for adaptive rate limiting. Your design should address:
- How to define the prediction target and business objective: abuse prevention, fairness to legitimate developers, and minimizing false throttling.
- The online serving architecture for making per-request allow / throttle / block decisions under a strict latency budget.
- A multi-stage decision pipeline (fast retrieval / coarse filtering → ML risk scoring → policy re-ranking or final action selection) and the models used at each stage.
- The offline and streaming data pipelines, including labels, delayed feedback, feature computation, and training cadence.
- Evaluation strategy: offline metrics, online experiments, and operational guardrails.
- Failure modes such as feature drift, training-serving skew, adversarial adaptation, and regional outages.
Constraints
- The system must never fully depend on ML; a deterministic fallback limiter is required.
- False positives are expensive: accidentally throttling top enterprise customers can cause revenue loss.
- Some abuse labels arrive hours or days later from fraud investigations, chargebacks, or downstream incident reports.
- The service must support tenant-specific policies, burst allowances, and compliance requirements for auditability.
- Cost matters: the incremental ML decisioning layer should stay below $0.00005 per request on average.