Product Context
ShopSphere is a large ecommerce marketplace with personalized recommendation surfaces on the home page, product detail page, cart page, and post-purchase page. Users expect relevant products across a catalog with fast-changing inventory, promotions, and seasonal demand.
Scale
| Signal | Value |
|---|
| DAU | 35M |
| Peak recommendation QPS | 180K |
| Total catalog | 120M SKUs |
| In-stock active catalog | 18M SKUs |
| New or updated items/day | 4M |
| End-to-end p99 latency budget | 150ms |
| Recommendation slots/request | 20-50 items |
Task
Design an end-to-end recommendation system from retrieval through ranking and serving. Address the following:
- Clarify the product goals, request types, and success metrics across recommendation surfaces.
- Propose a multi-stage architecture for candidate generation, ranking, and re-ranking, including how to handle cold start, inventory changes, and business rules.
- Design the offline and online data/feature pipelines, including training cadence, feature freshness, and how you avoid training-serving skew.
- Choose models for each stage and justify them against latency, scale, and maintainability constraints.
- Define how the system is served in production, including caching, fallback behavior, capacity planning, and latency allocation.
- Explain offline evaluation, online experimentation, monitoring, and the top failure modes you would expect at this scale.
Constraints
- Out-of-stock items must never be shown.
- Promotions and price changes should be reflected within 10 minutes.
- 25% of daily traffic comes from logged-out or low-history users.
- Cost matters: GPU use is allowed only for the heaviest stage if clearly justified.
- The system must support regional compliance requirements, including filtering restricted items by market.
- Merchandising teams need limited rule-based overrides for campaigns without fully bypassing relevance.