Product Context
NovaAPI is a public API platform used by mobile apps, SaaS vendors, and internal services. The company wants to replace static per-key quotas with an ML-driven rate limiter that predicts abusive or bursty traffic in real time and decides whether to allow, delay, challenge, or reject requests while minimizing impact on legitimate customers.
Scale
| Signal | Value |
|---|
| Active API keys | 45M |
| DAU (developers / apps generating traffic) | 18M |
| Peak request rate | 3.5M RPS globally |
| Regions | 6 active-active regions |
| Distinct endpoints | 12K |
| Feature freshness target | < 5s for traffic counters |
| End-to-end decision latency budget | p99 < 15ms |
| Historical logs retained for training | 90 days (~22T requests) |
Task
Design an end-to-end ML system for adaptive rate limiting. Your design should address:
- How to define the prediction problem and translate model outputs into rate-limit actions (allow, soft-throttle, hard-throttle, challenge).
- A multi-stage online architecture for high-throughput decisioning, including fast candidate policy retrieval, ML scoring, and final policy re-ranking / rule enforcement.
- The offline and streaming data pipelines for features, labels, training, and feedback loops, including delayed labels for abuse outcomes.
- Model choices for each stage, with clear tradeoffs between latency, interpretability, and recall of abusive traffic.
- Evaluation strategy: offline metrics, online experiments, and operational guardrails.
- Failure modes, especially feature drift, training-serving skew, regional outages, and false positives on high-value customers.
Constraints
- The system must preserve existing contractual limits for enterprise customers; ML can only tighten or relax within configured bounds.
- Some abuse labels arrive hours later from downstream fraud investigations or chargebacks.
- PII cannot be used directly in model features; features must satisfy internal privacy policy.
- The platform must fail open for a small allowlisted set of critical internal services, but fail closed for clearly malicious traffic patterns.
- Serving cost target is under $0.00002 per request at peak scale.