You are designing the recommendation system for a social app's home feed, where users see a ranked list of posts, reels, and suggested accounts when they open the app. The product already works well for heavy users, but engagement is weak for new users and for newly created content because the system has little interaction history. You need to improve cold-start quality without hurting the experience for established users. The business goal is to increase early-session engagement and 7-day retention while keeping the feed responsive.
| Signal | Value |
|---|---|
| DAU | 180M |
| Peak feed requests QPS | 900K |
| Active content catalog | 600M items |
| New items per day | 18M |
| New users per day | 3M |
| Per-request latency budget (p99) | 180ms |
How would you design this recommendation and ranking system to handle cold start for both users and items at this scale? Explain the end-to-end architecture, model and feature choices across retrieval, ranking, and re-ranking, and how you would evaluate, monitor, and safely iterate on the system.