Product Context
Design the recommendation infrastructure for Facebook Feed, where users open the app and expect a personalized, low-latency ranked feed of posts, reels, photos, links, and suggested content from friends, Groups, Pages, and recommended creators. The system must support global traffic across regions while keeping recommendations fresh and relevant.
Scale
| Signal | Value |
|---|
| DAU | 1.2B |
| Peak feed request QPS | 2.5M |
| Active content catalog | 3B eligible posts/reels over recent windows |
| New content/day | 250M |
| Candidate pool before ranking | 50K-200K per request |
| End-to-end latency budget (p99) | 150ms |
| Regions | North America, Europe, APAC, LATAM |
Task
Design an end-to-end ML system that can serve Facebook Feed recommendations globally at low latency.
- Define the functional and non-functional requirements, including freshness, availability, and personalization goals.
- Propose a multi-stage recommendation architecture from candidate generation to ranking and re-ranking, and explain how it scales across regions.
- Choose models and features for each stage, including how you would handle cold start, sparse feedback, and rapidly trending content.
- Design the training and serving stack, including batch vs streaming pipelines, feature storage, model deployment, and fallback behavior.
- Define how you would evaluate the system offline and online, and how you would monitor drift, training-serving skew, and production failures.
Constraints
- Fresh content should become eligible within 2-5 minutes of creation.
- User interaction features should be updated near real time.
- The system must tolerate regional outages and degrade gracefully.
- Serving cost matters: the most expensive models cannot run on every candidate.
- Must support policy filtering before final response (integrity, privacy, blocked entities, age/language constraints).