Product Context
Design the ML system behind Facebook People You May Know (PYMK), which recommends potential friends across surfaces like Home, Notifications, and the Friends tab. The goal is to help users form meaningful connections while avoiding spammy, low-quality, or privacy-sensitive suggestions.
Scale
| Signal | Value |
|---|
| DAU | 1.2B Facebook users |
| Peak recommendation requests | 900K QPS across surfaces |
| Social graph size | 3B+ accounts, 400B+ edges |
| Candidate pool per user | millions of reachable users before filtering |
| Suggestions shown/request | 10-20 |
| End-to-end latency budget (p99) | 150ms |
Task
- Clarify the product objective, prediction target, and key constraints for PYMK.
- Estimate system scale and design a multi-stage architecture from candidate generation to ranking and re-ranking.
- Choose models for each stage and explain what features are computed online vs offline.
- Design the training data pipeline, labels, and feedback loop while addressing delayed outcomes and negative sampling.
- Define offline and online evaluation, including guardrails for user trust, privacy, and ecosystem health.
- Identify major failure modes such as feature drift, training-serving skew, graph abuse, and cold-start, and explain mitigations.
Constraints
- Must exclude blocked users, existing friends, recent rejects, integrity-bad accounts, and policy-restricted pairs.
- Suggestions should reflect fresh graph activity within minutes, but serving must stay within strict latency and cost limits.
- New users and low-activity users have sparse history; the system still needs reasonable recommendations.
- The system should optimize for accepted friend requests and long-term connection quality, not just clicks on suggestions.