Design Intelligent API Rate-Limit Orchestrator

Product Context

Meta exposes large-scale APIs such as the Graph API, Marketing API, and WhatsApp Business Platform to millions of external developers and enterprise integrations. Design an ML-driven rate-limit orchestration system that decides which requests to admit, defer, batch, or shed so client integrations remain reliable while Meta controls infrastructure cost and protects backend services.

Scale

Signal	Value
Monthly active apps	12M
Daily active apps	3.5M
Peak inbound API QPS	22M requests/sec
Distinct app-user tokens/day	900M
Downstream protected services	150+ internal services
Per-request decision latency budget (p99)	25ms
Historical logs retained for training	180 days

Task

Clarify the product objective and define what “reliable” and “cost-efficient” mean for Meta and third-party developers.
Design an end-to-end ML system that predicts request value/risk and makes real-time rate-limit decisions using a multi-stage architecture.
Specify the online path, offline training pipeline, feature store design, and how the system handles tenant-level fairness, bursty traffic, and cold-start apps.
Choose models for retrieval, ranking, and final policy/re-ranking, and explain why each stage is appropriate under the latency budget.
Define offline and online evaluation, including guardrails for developer experience, backend protection, and cost.
Identify major failure modes such as feature drift, training-serving skew, abuse spikes, and bad model rollouts, with detection and mitigation.

Constraints

Must support hard platform quotas and policy constraints; ML can optimize within policy, not override compliance rules.
Some labels are delayed: downstream errors, app retries, and developer churn may arrive hours or days later.
Decisions must be explainable enough for internal operations and enterprise support escalations.
The system should reduce unnecessary throttling while preventing cascading failures in Graph API and dependent services.
Cost target: keep incremental ML serving cost below $0.00002 per API request on average.

Product Context

Scale

Signal	Value
Monthly active apps	12M
Daily active apps	3.5M
Peak inbound API QPS	22M requests/sec
Distinct app-user tokens/day	900M
Downstream protected services	150+ internal services
Per-request decision latency budget (p99)	25ms
Historical logs retained for training	180 days

Task

Clarify the product objective and define what “reliable” and “cost-efficient” mean for Meta and third-party developers.
Design an end-to-end ML system that predicts request value/risk and makes real-time rate-limit decisions using a multi-stage architecture.
Specify the online path, offline training pipeline, feature store design, and how the system handles tenant-level fairness, bursty traffic, and cold-start apps.
Choose models for retrieval, ranking, and final policy/re-ranking, and explain why each stage is appropriate under the latency budget.
Define offline and online evaluation, including guardrails for developer experience, backend protection, and cost.
Identify major failure modes such as feature drift, training-serving skew, abuse spikes, and bad model rollouts, with detection and mitigation.

Constraints

Must support hard platform quotas and policy constraints; ML can optimize within policy, not override compliance rules.
Some labels are delayed: downstream errors, app retries, and developer churn may arrive hours or days later.
Decisions must be explainable enough for internal operations and enterprise support escalations.
The system should reduce unnecessary throttling while preventing cascading failures in Graph API and dependent services.
Cost target: keep incremental ML serving cost below $0.00002 per API request on average.

Product Context

Scale

Signal	Value
Monthly active apps	12M
Daily active apps	3.5M
Peak inbound API QPS	22M requests/sec
Distinct app-user tokens/day	900M
Downstream protected services	150+ internal services
Per-request decision latency budget (p99)	25ms
Historical logs retained for training	180 days

Task

Clarify the product objective and define what “reliable” and “cost-efficient” mean for Meta and third-party developers.
Design an end-to-end ML system that predicts request value/risk and makes real-time rate-limit decisions using a multi-stage architecture.
Specify the online path, offline training pipeline, feature store design, and how the system handles tenant-level fairness, bursty traffic, and cold-start apps.
Choose models for retrieval, ranking, and final policy/re-ranking, and explain why each stage is appropriate under the latency budget.
Define offline and online evaluation, including guardrails for developer experience, backend protection, and cost.
Identify major failure modes such as feature drift, training-serving skew, abuse spikes, and bad model rollouts, with detection and mitigation.

Constraints

Must support hard platform quotas and policy constraints; ML can optimize within policy, not override compliance rules.
Some labels are delayed: downstream errors, app retries, and developer churn may arrive hours or days later.
Decisions must be explainable enough for internal operations and enterprise support escalations.
The system should reduce unnecessary throttling while preventing cascading failures in Graph API and dependent services.
Cost target: keep incremental ML serving cost below $0.00002 per API request on average.

Product Context

Scale

Signal	Value
Monthly active apps	12M
Daily active apps	3.5M
Peak inbound API QPS	22M requests/sec
Distinct app-user tokens/day	900M
Downstream protected services	150+ internal services
Per-request decision latency budget (p99)	25ms
Historical logs retained for training	180 days

Task

Clarify the product objective and define what “reliable” and “cost-efficient” mean for Meta and third-party developers.
Design an end-to-end ML system that predicts request value/risk and makes real-time rate-limit decisions using a multi-stage architecture.
Specify the online path, offline training pipeline, feature store design, and how the system handles tenant-level fairness, bursty traffic, and cold-start apps.
Choose models for retrieval, ranking, and final policy/re-ranking, and explain why each stage is appropriate under the latency budget.
Define offline and online evaluation, including guardrails for developer experience, backend protection, and cost.
Identify major failure modes such as feature drift, training-serving skew, abuse spikes, and bad model rollouts, with detection and mitigation.

Constraints

Must support hard platform quotas and policy constraints; ML can optimize within policy, not override compliance rules.
Some labels are delayed: downstream errors, app retries, and developer churn may arrive hours or days later.
Decisions must be explainable enough for internal operations and enterprise support escalations.
The system should reduce unnecessary throttling while preventing cascading failures in Graph API and dependent services.
Cost target: keep incremental ML serving cost below $0.00002 per API request on average.

Interview Guides

Product Context

Scale

Task

Constraints

Design Intelligent API Rate-Limit Orchestrator

Product Context

Scale

Task

Constraints

Your Answer

Design Intelligent API Rate-Limit Orchestrator

Product Context

Scale

Task

Constraints

Design Intelligent API Rate-Limit Orchestrator

Product Context

Scale

Task

Constraints

Your Answer