You are leading the design of a personalized search and recommendation system for a large e-commerce marketplace. When shoppers search or browse a category, the system must retrieve and rank relevant products in real time while balancing conversion, revenue, customer experience, and seller ecosystem health. The current rules-based stack is no longer keeping up with catalog growth, seasonal demand shifts, and personalization needs. The business wants a new ML-driven system that improves purchase rate without hurting latency or availability on a core shopping surface.
| Signal | Value |
|---|---|
| DAU | 65M shoppers |
| Peak search/browse QPS | 420K requests/sec |
| Active product catalog | 180M ASINs |
| New or updated listings/day | 14M |
| Candidates scored/request | 20K retrieved 1K ranked 100 re-ranked |
| End-to-end latency budget (p99) | 180ms |
How would you design this end-to-end ML system, including retrieval, ranking, re-ranking, training and serving architecture, and the way you would evaluate, monitor, and operate it safely at this scale?