Product Context
Walmart wants to improve product ranking across search and home-page recommendation surfaces so customers see relevant items that are also actually purchasable. The core challenge is balancing availability of the ML serving stack with consistency of inventory, price, and fulfillment signals across stores, FCs, and channels.
Scale
| Signal | Value |
|---|
| DAU | 90M shoppers across app + web |
| Peak read QPS | 650K ranking requests/sec during major events |
| Peak add-to-cart / checkout QPS | 120K writes/sec |
| Active catalog | 180M SKUs |
| Store / FC locations | 5,000+ pickup / delivery nodes |
| Candidate set per request | 20K retrieval → 1K ranking → 100 re-ranking |
| End-to-end p99 latency budget | 180ms |
| Inventory freshness target | < 60s for local availability |
Task
Design an end-to-end ML system for inventory-aware product retrieval and ranking. Your design should explicitly address where the system should prefer strong consistency versus high availability / eventual consistency, and how those choices affect ML quality and user experience.
- Clarify product requirements and define what “correct enough” means for search, browse, cart, and checkout surfaces.
- Size the system and propose a multi-stage architecture: retrieval, ranking, and re-ranking.
- Define how inventory, price, fulfillment ETA, and store-local features are computed and served online vs batch.
- Choose models for each stage and explain how consistency constraints change feature design and serving strategy.
- Describe offline and online evaluation, including how you would detect regressions caused by stale features or skew.
- Identify key failure modes, especially around inventory inconsistency, degraded dependencies, and fallback behavior.
Constraints
- Search and recommendation pages should remain available even if some inventory systems are degraded.
- Checkout and promise-date decisions require stricter correctness than top-of-funnel ranking.
- Inventory updates arrive from stores, warehouses, and third-party sellers with heterogeneous delays.
- Cost matters: the system should avoid expensive per-request joins across all fulfillment nodes.
- Compliance/auditability is required for price and availability decisions shown to users.