Interview Guides

Design ML-Powered API Rate Limiter

Hard

ML System Design

Product Context

NovaAPI is a public API platform used by mobile apps, SaaS vendors, and internal services. The company wants to replace static per-key quotas with an ML-driven rate limiter that predicts abusive or bursty traffic in real time and decides whether to allow, delay, challenge, or reject requests while minimizing impact on legitimate customers.

Scale

Signal	Value
Active API keys	45M
DAU (developers / apps generating traffic)	18M
Peak request rate	3.5M RPS globally
Regions	6 active-active regions
Distinct endpoints	12K
Feature freshness target	< 5s for traffic counters
End-to-end decision latency budget	p99 < 15ms
Historical logs retained for training	90 days (~22T requests)

Task

Design an end-to-end ML system for adaptive rate limiting. Your design should address:

How to define the prediction problem and translate model outputs into rate-limit actions (allow, soft-throttle, hard-throttle, challenge).
A multi-stage online architecture for high-throughput decisioning, including fast candidate policy retrieval, ML scoring, and final policy re-ranking / rule enforcement.
The offline and streaming data pipelines for features, labels, training, and feedback loops, including delayed labels for abuse outcomes.
Model choices for each stage, with clear tradeoffs between latency, interpretability, and recall of abusive traffic.
Evaluation strategy: offline metrics, online experiments, and operational guardrails.
Failure modes, especially feature drift, training-serving skew, regional outages, and false positives on high-value customers.

Constraints

The system must preserve existing contractual limits for enterprise customers; ML can only tighten or relax within configured bounds.
Some abuse labels arrive hours later from downstream fraud investigations or chargebacks.
PII cannot be used directly in model features; features must satisfy internal privacy policy.
The platform must fail open for a small allowlisted set of critical internal services, but fail closed for clearly malicious traffic patterns.
Serving cost target is under $0.00002 per request at peak scale.

Design ML-Powered API Rate Limiter

Hard

ML System Design

Product Context

Scale

Signal	Value
Active API keys	45M
DAU (developers / apps generating traffic)	18M
Peak request rate	3.5M RPS globally
Regions	6 active-active regions
Distinct endpoints	12K
Feature freshness target	< 5s for traffic counters
End-to-end decision latency budget	p99 < 15ms
Historical logs retained for training	90 days (~22T requests)

Task

Design an end-to-end ML system for adaptive rate limiting. Your design should address:

How to define the prediction problem and translate model outputs into rate-limit actions (allow, soft-throttle, hard-throttle, challenge).
A multi-stage online architecture for high-throughput decisioning, including fast candidate policy retrieval, ML scoring, and final policy re-ranking / rule enforcement.
The offline and streaming data pipelines for features, labels, training, and feedback loops, including delayed labels for abuse outcomes.
Model choices for each stage, with clear tradeoffs between latency, interpretability, and recall of abusive traffic.
Evaluation strategy: offline metrics, online experiments, and operational guardrails.
Failure modes, especially feature drift, training-serving skew, regional outages, and false positives on high-value customers.

Constraints

The system must preserve existing contractual limits for enterprise customers; ML can only tighten or relax within configured bounds.
Some abuse labels arrive hours later from downstream fraud investigations or chargebacks.
PII cannot be used directly in model features; features must satisfy internal privacy policy.
The platform must fail open for a small allowlisted set of critical internal services, but fail closed for clearly malicious traffic patterns.
Serving cost target is under $0.00002 per request at peak scale.

Your Answer

Design ML-Powered API Rate Limiter

Hard

ML System Design

Product Context

Scale

Signal	Value
Active API keys	45M
DAU (developers / apps generating traffic)	18M
Peak request rate	3.5M RPS globally
Regions	6 active-active regions
Distinct endpoints	12K
Feature freshness target	< 5s for traffic counters
End-to-end decision latency budget	p99 < 15ms
Historical logs retained for training	90 days (~22T requests)

Task

Design an end-to-end ML system for adaptive rate limiting. Your design should address:

How to define the prediction problem and translate model outputs into rate-limit actions (allow, soft-throttle, hard-throttle, challenge).
A multi-stage online architecture for high-throughput decisioning, including fast candidate policy retrieval, ML scoring, and final policy re-ranking / rule enforcement.
The offline and streaming data pipelines for features, labels, training, and feedback loops, including delayed labels for abuse outcomes.
Model choices for each stage, with clear tradeoffs between latency, interpretability, and recall of abusive traffic.
Evaluation strategy: offline metrics, online experiments, and operational guardrails.
Failure modes, especially feature drift, training-serving skew, regional outages, and false positives on high-value customers.

Constraints

The system must preserve existing contractual limits for enterprise customers; ML can only tighten or relax within configured bounds.
Some abuse labels arrive hours later from downstream fraud investigations or chargebacks.
PII cannot be used directly in model features; features must satisfy internal privacy policy.
The platform must fail open for a small allowlisted set of critical internal services, but fail closed for clearly malicious traffic patterns.
Serving cost target is under $0.00002 per request at peak scale.

Design ML-Powered API Rate Limiter

Hard

ML System Design

Product Context

Scale

Signal	Value
Active API keys	45M
DAU (developers / apps generating traffic)	18M
Peak request rate	3.5M RPS globally
Regions	6 active-active regions
Distinct endpoints	12K
Feature freshness target	< 5s for traffic counters
End-to-end decision latency budget	p99 < 15ms
Historical logs retained for training	90 days (~22T requests)

Task

Design an end-to-end ML system for adaptive rate limiting. Your design should address:

How to define the prediction problem and translate model outputs into rate-limit actions (allow, soft-throttle, hard-throttle, challenge).
A multi-stage online architecture for high-throughput decisioning, including fast candidate policy retrieval, ML scoring, and final policy re-ranking / rule enforcement.
The offline and streaming data pipelines for features, labels, training, and feedback loops, including delayed labels for abuse outcomes.
Model choices for each stage, with clear tradeoffs between latency, interpretability, and recall of abusive traffic.
Evaluation strategy: offline metrics, online experiments, and operational guardrails.
Failure modes, especially feature drift, training-serving skew, regional outages, and false positives on high-value customers.

Constraints

The system must preserve existing contractual limits for enterprise customers; ML can only tighten or relax within configured bounds.
Some abuse labels arrive hours later from downstream fraud investigations or chargebacks.
PII cannot be used directly in model features; features must satisfy internal privacy policy.
The platform must fail open for a small allowlisted set of critical internal services, but fail closed for clearly malicious traffic patterns.
Serving cost target is under $0.00002 per request at peak scale.