Product Context
ShopNow is a large e-commerce marketplace. The homepage, search, and product-detail pages rely on a distributed cache to serve personalized recommendations, popular products, pricing summaries, and feature vectors under tight latency budgets.
Scale
| Signal | Value |
|---|
| DAU | 45M |
| Peak read QPS | 900K requests/sec |
| Peak write/invalidation QPS | 120K events/sec |
| Active product catalog | 180M SKUs |
| Personalized cache keys | ~2.5B active/day |
| End-to-end p99 latency budget | 120ms |
| Cache memory budget | 14 TB across regions |
Task
Design an ML-driven caching strategy that decides what to cache, where to cache, and when to evict or refresh for high-traffic product surfaces. Assume not all objects fit in memory, request patterns are highly skewed, and popularity changes quickly during promotions.
Your design should address:
- How you would frame the problem and define the prediction targets for cache admission, TTL selection, and eviction priority
- A multi-stage architecture for cache candidate retrieval, ranking, and re-ranking under strict latency limits
- The offline and online data pipelines, including labels, feature computation, and training cadence
- The online serving design, including feature store usage, cache hierarchy, fallback behavior, and capacity planning
- How you would evaluate the system offline and online, and how you would monitor drift, skew, and operational failures
- Key tradeoffs around hit rate, freshness, cost, and model complexity
Constraints
- Product price and inventory updates must propagate within 2 minutes for 99% of SKUs
- Personalized cache entries may use only privacy-approved features; no raw PII in cache keys or model features
- The system must support regional caches with partial catalog overlap
- During major sales events, request distribution can shift 10x within 15 minutes
- If the ML policy is unavailable, the platform must fall back to deterministic caching rules without causing outages