Product Context
Mercato is a global e-commerce marketplace serving buyers across 40 countries. The inventory platform must decide, in real time, whether an item is truly available to promise for checkout, despite delayed warehouse scans, order cancellations, returns, and concurrent demand spikes.
Scale
| Signal | Value |
|---|
| DAU | 85M |
| Peak product-page QPS | 420K |
| Peak checkout QPS | 55K |
| Active SKUs | 120M |
| Warehouses / stores / sellers | 210K nodes globally |
| Inventory mutation events | 180M/day |
| Latency budget for availability decision | 80ms p99 |
Task
Design an end-to-end ML system that predicts real-time sellable inventory and supports downstream ranking of fulfillment options. Your design should address:
- Requirements and scope: What exact prediction is made (e.g., in-stock probability, units available, oversell risk), who consumes it, and what SLAs matter most.
- System architecture: A multi-stage online path from candidate inventory sources to ranking and final re-ranking/policy checks, plus the offline training and feature pipelines.
- Modeling choices: What models you would use for candidate retrieval, ranking, and final decisioning; how you would combine ML with hard business rules such as reserved stock, compliance holds, and seller-specific constraints.
- Serving design: Online vs batch features, feature store design, cache strategy, latency budget allocation, and regional deployment for global traffic.
- Evaluation: Offline metrics, online experimentation or shadow testing, and how to measure business impact such as reduced oversells, improved conversion, and fulfillment reliability.
- Failure modes: How you would detect and mitigate feature drift, training-serving skew, stale events, hot SKUs, and regional outages.
Constraints
- Inventory freshness target is under 5 seconds for first-party warehouses and under 60 seconds for third-party sellers.
- Overselling high-demand items is very costly; false positives are worse than false negatives for checkout.
- Some regions require data residency and cannot share raw user-level data across borders.
- The system must continue serving degraded but safe answers during stream lag, model outages, or warehouse feed disruptions.
- Cost matters: most requests should be served on CPU, with limited GPU use reserved for offline training only.