Product Context
Meta exposes large-scale APIs such as the Graph API, Marketing API, and WhatsApp Business Platform to millions of external developers and enterprise integrations. Design an ML-driven rate-limit orchestration system that decides which requests to admit, defer, batch, or shed so client integrations remain reliable while Meta controls infrastructure cost and protects backend services.
Scale
| Signal | Value |
|---|
| Monthly active apps | 12M |
| Daily active apps | 3.5M |
| Peak inbound API QPS | 22M requests/sec |
| Distinct app-user tokens/day | 900M |
| Downstream protected services | 150+ internal services |
| Per-request decision latency budget (p99) | 25ms |
| Historical logs retained for training | 180 days |
Task
- Clarify the product objective and define what “reliable” and “cost-efficient” mean for Meta and third-party developers.
- Design an end-to-end ML system that predicts request value/risk and makes real-time rate-limit decisions using a multi-stage architecture.
- Specify the online path, offline training pipeline, feature store design, and how the system handles tenant-level fairness, bursty traffic, and cold-start apps.
- Choose models for retrieval, ranking, and final policy/re-ranking, and explain why each stage is appropriate under the latency budget.
- Define offline and online evaluation, including guardrails for developer experience, backend protection, and cost.
- Identify major failure modes such as feature drift, training-serving skew, abuse spikes, and bad model rollouts, with detection and mitigation.
Constraints
- Must support hard platform quotas and policy constraints; ML can optimize within policy, not override compliance rules.
- Some labels are delayed: downstream errors, app retries, and developer churn may arrive hours or days later.
- Decisions must be explainable enough for internal operations and enterprise support escalations.
- The system should reduce unnecessary throttling while preventing cascading failures in Graph API and dependent services.
- Cost target: keep incremental ML serving cost below $0.00002 per API request on average.