Problem
Design a rate-limiter for the Meta Graph API that must enforce limits at millions of requests per second across multiple regions. The system is part of Meta's security infrastructure, so the goal is not only fairness and abuse prevention, but also resilience during attacks, partial outages, and sudden traffic spikes.
Requirements
- Enforce limits for multiple dimensions, such as app ID, user access token, IP / subnet, and API endpoint.
- Support different policies: global quotas, per-second burst limits, and rolling-window limits.
- Make allow/deny decisions with very low latency on the request path.
- Remain correct enough under regional failover, cache loss, clock skew, and backend degradation.
- Prevent common bypasses such as key rotation, distributed abuse across many IPs, and hot-key concentration.
What to Cover
Explain your design for:
- request-path architecture and where enforcement happens
- choice of algorithm (for example token bucket, leaky bucket, sliding window log/counter)
- state storage strategy for counters at very high QPS
- sharding, replication, and multi-region behavior
- consistency model and acceptable error bounds
- handling of retries, idempotency, and race conditions
- observability, alerting, and operational controls
- security concerns, including abuse detection hooks and fail-open vs fail-closed decisions
Example
A single app suddenly sends 8M requests/sec to a write-heavy Graph API endpoint from many edge locations. Describe how your system detects the surge, applies the correct per-app and per-endpoint limits, avoids overloading shared infrastructure, and still protects legitimate traffic.
Be explicit about trade-offs. A strong answer should separate the fast path from the control plane and justify where approximate counting is acceptable versus where strict enforcement is required.