Design ML-Driven Cache Ranking

Product Context

ShopNow is a large e-commerce marketplace. The homepage, search, and product-detail pages rely on a distributed cache to serve personalized recommendations, popular products, pricing summaries, and feature vectors under tight latency budgets.

Scale

Signal	Value
DAU	45M
Peak read QPS	900K requests/sec
Peak write/invalidation QPS	120K events/sec
Active product catalog	180M SKUs
Personalized cache keys	~2.5B active/day
End-to-end p99 latency budget	120ms
Cache memory budget	14 TB across regions

Task

Design an ML-driven caching strategy that decides what to cache, where to cache, and when to evict or refresh for high-traffic product surfaces. Assume not all objects fit in memory, request patterns are highly skewed, and popularity changes quickly during promotions.

Your design should address:

How you would frame the problem and define the prediction targets for cache admission, TTL selection, and eviction priority
A multi-stage architecture for cache candidate retrieval, ranking, and re-ranking under strict latency limits
The offline and online data pipelines, including labels, feature computation, and training cadence
The online serving design, including feature store usage, cache hierarchy, fallback behavior, and capacity planning
How you would evaluate the system offline and online, and how you would monitor drift, skew, and operational failures
Key tradeoffs around hit rate, freshness, cost, and model complexity

Constraints

Product price and inventory updates must propagate within 2 minutes for 99% of SKUs
Personalized cache entries may use only privacy-approved features; no raw PII in cache keys or model features
The system must support regional caches with partial catalog overlap
During major sales events, request distribution can shift 10x within 15 minutes
If the ML policy is unavailable, the platform must fall back to deterministic caching rules without causing outages

Product Context

Scale

Signal	Value
DAU	45M
Peak read QPS	900K requests/sec
Peak write/invalidation QPS	120K events/sec
Active product catalog	180M SKUs
Personalized cache keys	~2.5B active/day
End-to-end p99 latency budget	120ms
Cache memory budget	14 TB across regions

Task

Your design should address:

How you would frame the problem and define the prediction targets for cache admission, TTL selection, and eviction priority
A multi-stage architecture for cache candidate retrieval, ranking, and re-ranking under strict latency limits
The offline and online data pipelines, including labels, feature computation, and training cadence
The online serving design, including feature store usage, cache hierarchy, fallback behavior, and capacity planning
How you would evaluate the system offline and online, and how you would monitor drift, skew, and operational failures
Key tradeoffs around hit rate, freshness, cost, and model complexity

Constraints

Product price and inventory updates must propagate within 2 minutes for 99% of SKUs
Personalized cache entries may use only privacy-approved features; no raw PII in cache keys or model features
The system must support regional caches with partial catalog overlap
During major sales events, request distribution can shift 10x within 15 minutes
If the ML policy is unavailable, the platform must fall back to deterministic caching rules without causing outages

Product Context

Scale

Signal	Value
DAU	45M
Peak read QPS	900K requests/sec
Peak write/invalidation QPS	120K events/sec
Active product catalog	180M SKUs
Personalized cache keys	~2.5B active/day
End-to-end p99 latency budget	120ms
Cache memory budget	14 TB across regions

Task

Your design should address:

How you would frame the problem and define the prediction targets for cache admission, TTL selection, and eviction priority
A multi-stage architecture for cache candidate retrieval, ranking, and re-ranking under strict latency limits
The offline and online data pipelines, including labels, feature computation, and training cadence
The online serving design, including feature store usage, cache hierarchy, fallback behavior, and capacity planning
How you would evaluate the system offline and online, and how you would monitor drift, skew, and operational failures
Key tradeoffs around hit rate, freshness, cost, and model complexity

Constraints

Product price and inventory updates must propagate within 2 minutes for 99% of SKUs
Personalized cache entries may use only privacy-approved features; no raw PII in cache keys or model features
The system must support regional caches with partial catalog overlap
During major sales events, request distribution can shift 10x within 15 minutes
If the ML policy is unavailable, the platform must fall back to deterministic caching rules without causing outages

Product Context

Scale

Signal	Value
DAU	45M
Peak read QPS	900K requests/sec
Peak write/invalidation QPS	120K events/sec
Active product catalog	180M SKUs
Personalized cache keys	~2.5B active/day
End-to-end p99 latency budget	120ms
Cache memory budget	14 TB across regions

Task

Your design should address:

How you would frame the problem and define the prediction targets for cache admission, TTL selection, and eviction priority
A multi-stage architecture for cache candidate retrieval, ranking, and re-ranking under strict latency limits
The offline and online data pipelines, including labels, feature computation, and training cadence
The online serving design, including feature store usage, cache hierarchy, fallback behavior, and capacity planning
How you would evaluate the system offline and online, and how you would monitor drift, skew, and operational failures
Key tradeoffs around hit rate, freshness, cost, and model complexity

Constraints

Product price and inventory updates must propagate within 2 minutes for 99% of SKUs
Personalized cache entries may use only privacy-approved features; no raw PII in cache keys or model features
The system must support regional caches with partial catalog overlap
During major sales events, request distribution can shift 10x within 15 minutes
If the ML policy is unavailable, the platform must fall back to deterministic caching rules without causing outages

Interview Guides

Product Context

Scale

Task

Constraints

Design ML-Driven Cache Ranking

Product Context

Scale

Task

Constraints

Your Answer

Design ML-Driven Cache Ranking

Product Context

Scale

Task

Constraints

Design ML-Driven Cache Ranking

Product Context

Scale

Task

Constraints

Your Answer