Product Context
Walmart needs highly accurate, real-time inventory availability across Walmart stores, fulfillment centers, and warehouses to power walmart.com item pages, pickup promises, substitution decisions, and internal replenishment workflows. The system must combine transactional correctness with ML-based estimation when raw inventory signals are delayed, noisy, or conflicting.
Scale
| Signal | Value |
|---|
| DAU touching inventory-backed surfaces | 90M |
| Peak read QPS for availability checks | 350K |
| Peak inventory update events/sec | 1.8M |
| U.S. stores + FCs + warehouses | 5,500+ nodes |
| Active SKU-location pairs | 1.2B |
| Daily inventory-affecting events | 40B |
| p99 latency budget for online availability response | 120ms |
Task
Design an end-to-end ML system that provides real-time inventory availability with strong correctness guarantees while using ML to infer true sellable inventory under uncertainty.
- Clarify the product requirements and define what “strong correctness guarantees” means for Walmart’s customer-facing and operational surfaces.
- Propose a multi-stage architecture for inventory state estimation, including deterministic event processing, candidate state retrieval, ML scoring/ranking of conflicting signals, and final decisioning.
- Design the offline and online data/feature pipelines, including how Walmart would use a shared feature store and avoid training-serving skew.
- Choose models for each stage and explain when to prefer rules, probabilistic models, or learned rankers.
- Define offline and online evaluation, including precision/recall tradeoffs for in-stock vs out-of-stock decisions and experiment strategy on walmart.com and pickup flows.
- Identify failure modes such as delayed scans, duplicate events, feature drift, stale warehouse feeds, and partial regional outages.
Constraints
- Customer-facing availability must prefer false negatives over false positives for pickup and delivery promises.
- Some source systems are eventually consistent and can arrive out of order by minutes.
- The design must support auditability for inventory decisions and rollback to deterministic rules.
- Cost matters: the ML layer cannot require per-request heavy GPU inference.
- Inventory data is business-critical and must meet strict access control and compliance requirements.