Scale Facebook Feed Recommendations

Product Context

Design the recommendation infrastructure for Facebook Feed, where users open the app and expect a personalized, low-latency ranked feed of posts, reels, photos, links, and suggested content from friends, Groups, Pages, and recommended creators. The system must support global traffic across regions while keeping recommendations fresh and relevant.

Scale

Signal	Value
DAU	1.2B
Peak feed request QPS	2.5M
Active content catalog	3B eligible posts/reels over recent windows
New content/day	250M
Candidate pool before ranking	50K-200K per request
End-to-end latency budget (p99)	150ms
Regions	North America, Europe, APAC, LATAM

Task

Design an end-to-end ML system that can serve Facebook Feed recommendations globally at low latency.

Define the functional and non-functional requirements, including freshness, availability, and personalization goals.
Propose a multi-stage recommendation architecture from candidate generation to ranking and re-ranking, and explain how it scales across regions.
Choose models and features for each stage, including how you would handle cold start, sparse feedback, and rapidly trending content.
Design the training and serving stack, including batch vs streaming pipelines, feature storage, model deployment, and fallback behavior.
Define how you would evaluate the system offline and online, and how you would monitor drift, training-serving skew, and production failures.

Constraints

Fresh content should become eligible within 2-5 minutes of creation.
User interaction features should be updated near real time.
The system must tolerate regional outages and degrade gracefully.
Serving cost matters: the most expensive models cannot run on every candidate.
Must support policy filtering before final response (integrity, privacy, blocked entities, age/language constraints).

Product Context

Scale

Signal	Value
DAU	1.2B
Peak feed request QPS	2.5M
Active content catalog	3B eligible posts/reels over recent windows
New content/day	250M
Candidate pool before ranking	50K-200K per request
End-to-end latency budget (p99)	150ms
Regions	North America, Europe, APAC, LATAM

Task

Design an end-to-end ML system that can serve Facebook Feed recommendations globally at low latency.

Define the functional and non-functional requirements, including freshness, availability, and personalization goals.
Propose a multi-stage recommendation architecture from candidate generation to ranking and re-ranking, and explain how it scales across regions.
Choose models and features for each stage, including how you would handle cold start, sparse feedback, and rapidly trending content.
Design the training and serving stack, including batch vs streaming pipelines, feature storage, model deployment, and fallback behavior.
Define how you would evaluate the system offline and online, and how you would monitor drift, training-serving skew, and production failures.

Constraints

Fresh content should become eligible within 2-5 minutes of creation.
User interaction features should be updated near real time.
The system must tolerate regional outages and degrade gracefully.
Serving cost matters: the most expensive models cannot run on every candidate.
Must support policy filtering before final response (integrity, privacy, blocked entities, age/language constraints).

Product Context

Scale

Signal	Value
DAU	1.2B
Peak feed request QPS	2.5M
Active content catalog	3B eligible posts/reels over recent windows
New content/day	250M
Candidate pool before ranking	50K-200K per request
End-to-end latency budget (p99)	150ms
Regions	North America, Europe, APAC, LATAM

Task

Design an end-to-end ML system that can serve Facebook Feed recommendations globally at low latency.

Define the functional and non-functional requirements, including freshness, availability, and personalization goals.
Propose a multi-stage recommendation architecture from candidate generation to ranking and re-ranking, and explain how it scales across regions.
Choose models and features for each stage, including how you would handle cold start, sparse feedback, and rapidly trending content.
Design the training and serving stack, including batch vs streaming pipelines, feature storage, model deployment, and fallback behavior.
Define how you would evaluate the system offline and online, and how you would monitor drift, training-serving skew, and production failures.

Constraints

Fresh content should become eligible within 2-5 minutes of creation.
User interaction features should be updated near real time.
The system must tolerate regional outages and degrade gracefully.
Serving cost matters: the most expensive models cannot run on every candidate.
Must support policy filtering before final response (integrity, privacy, blocked entities, age/language constraints).

Product Context

Scale

Signal	Value
DAU	1.2B
Peak feed request QPS	2.5M
Active content catalog	3B eligible posts/reels over recent windows
New content/day	250M
Candidate pool before ranking	50K-200K per request
End-to-end latency budget (p99)	150ms
Regions	North America, Europe, APAC, LATAM

Task

Design an end-to-end ML system that can serve Facebook Feed recommendations globally at low latency.

Define the functional and non-functional requirements, including freshness, availability, and personalization goals.
Propose a multi-stage recommendation architecture from candidate generation to ranking and re-ranking, and explain how it scales across regions.
Choose models and features for each stage, including how you would handle cold start, sparse feedback, and rapidly trending content.
Design the training and serving stack, including batch vs streaming pipelines, feature storage, model deployment, and fallback behavior.
Define how you would evaluate the system offline and online, and how you would monitor drift, training-serving skew, and production failures.

Constraints

Fresh content should become eligible within 2-5 minutes of creation.
User interaction features should be updated near real time.
The system must tolerate regional outages and degrade gracefully.
Serving cost matters: the most expensive models cannot run on every candidate.
Must support policy filtering before final response (integrity, privacy, blocked entities, age/language constraints).

Interview Guides

Product Context

Scale

Task

Constraints

Scale Facebook Feed Recommendations

Product Context

Scale

Task

Constraints

Your Answer

Scale Facebook Feed Recommendations

Product Context

Scale

Task

Constraints

Scale Facebook Feed Recommendations

Product Context

Scale

Task

Constraints

Your Answer