Interview Guides

Design Real-Time Global Inventory Predictor

Hard

ML System Design

Product Context

Mercato is a global e-commerce marketplace serving buyers across 40 countries. The inventory platform must decide, in real time, whether an item is truly available to promise for checkout, despite delayed warehouse scans, order cancellations, returns, and concurrent demand spikes.

Scale

Signal	Value
DAU	85M
Peak product-page QPS	420K
Peak checkout QPS	55K
Active SKUs	120M
Warehouses / stores / sellers	210K nodes globally
Inventory mutation events	180M/day
Latency budget for availability decision	80ms p99

Task

Design an end-to-end ML system that predicts real-time sellable inventory and supports downstream ranking of fulfillment options. Your design should address:

Requirements and scope: What exact prediction is made (e.g., in-stock probability, units available, oversell risk), who consumes it, and what SLAs matter most.
System architecture: A multi-stage online path from candidate inventory sources to ranking and final re-ranking/policy checks, plus the offline training and feature pipelines.
Modeling choices: What models you would use for candidate retrieval, ranking, and final decisioning; how you would combine ML with hard business rules such as reserved stock, compliance holds, and seller-specific constraints.
Serving design: Online vs batch features, feature store design, cache strategy, latency budget allocation, and regional deployment for global traffic.
Evaluation: Offline metrics, online experimentation or shadow testing, and how to measure business impact such as reduced oversells, improved conversion, and fulfillment reliability.
Failure modes: How you would detect and mitigate feature drift, training-serving skew, stale events, hot SKUs, and regional outages.

Constraints

Inventory freshness target is under 5 seconds for first-party warehouses and under 60 seconds for third-party sellers.
Overselling high-demand items is very costly; false positives are worse than false negatives for checkout.
Some regions require data residency and cannot share raw user-level data across borders.
The system must continue serving degraded but safe answers during stream lag, model outages, or warehouse feed disruptions.
Cost matters: most requests should be served on CPU, with limited GPU use reserved for offline training only.

Design Real-Time Global Inventory Predictor

Hard

ML System Design

Product Context

Scale

Signal	Value
DAU	85M
Peak product-page QPS	420K
Peak checkout QPS	55K
Active SKUs	120M
Warehouses / stores / sellers	210K nodes globally
Inventory mutation events	180M/day
Latency budget for availability decision	80ms p99

Task

Design an end-to-end ML system that predicts real-time sellable inventory and supports downstream ranking of fulfillment options. Your design should address:

Requirements and scope: What exact prediction is made (e.g., in-stock probability, units available, oversell risk), who consumes it, and what SLAs matter most.
System architecture: A multi-stage online path from candidate inventory sources to ranking and final re-ranking/policy checks, plus the offline training and feature pipelines.
Modeling choices: What models you would use for candidate retrieval, ranking, and final decisioning; how you would combine ML with hard business rules such as reserved stock, compliance holds, and seller-specific constraints.
Serving design: Online vs batch features, feature store design, cache strategy, latency budget allocation, and regional deployment for global traffic.
Evaluation: Offline metrics, online experimentation or shadow testing, and how to measure business impact such as reduced oversells, improved conversion, and fulfillment reliability.
Failure modes: How you would detect and mitigate feature drift, training-serving skew, stale events, hot SKUs, and regional outages.

Constraints

Inventory freshness target is under 5 seconds for first-party warehouses and under 60 seconds for third-party sellers.
Overselling high-demand items is very costly; false positives are worse than false negatives for checkout.
Some regions require data residency and cannot share raw user-level data across borders.
The system must continue serving degraded but safe answers during stream lag, model outages, or warehouse feed disruptions.
Cost matters: most requests should be served on CPU, with limited GPU use reserved for offline training only.

Your Answer

Design Real-Time Global Inventory Predictor

Hard

ML System Design

Product Context

Scale

Signal	Value
DAU	85M
Peak product-page QPS	420K
Peak checkout QPS	55K
Active SKUs	120M
Warehouses / stores / sellers	210K nodes globally
Inventory mutation events	180M/day
Latency budget for availability decision	80ms p99

Task

Design an end-to-end ML system that predicts real-time sellable inventory and supports downstream ranking of fulfillment options. Your design should address:

Requirements and scope: What exact prediction is made (e.g., in-stock probability, units available, oversell risk), who consumes it, and what SLAs matter most.
System architecture: A multi-stage online path from candidate inventory sources to ranking and final re-ranking/policy checks, plus the offline training and feature pipelines.
Modeling choices: What models you would use for candidate retrieval, ranking, and final decisioning; how you would combine ML with hard business rules such as reserved stock, compliance holds, and seller-specific constraints.
Serving design: Online vs batch features, feature store design, cache strategy, latency budget allocation, and regional deployment for global traffic.
Evaluation: Offline metrics, online experimentation or shadow testing, and how to measure business impact such as reduced oversells, improved conversion, and fulfillment reliability.
Failure modes: How you would detect and mitigate feature drift, training-serving skew, stale events, hot SKUs, and regional outages.

Constraints

Inventory freshness target is under 5 seconds for first-party warehouses and under 60 seconds for third-party sellers.
Overselling high-demand items is very costly; false positives are worse than false negatives for checkout.
Some regions require data residency and cannot share raw user-level data across borders.
The system must continue serving degraded but safe answers during stream lag, model outages, or warehouse feed disruptions.
Cost matters: most requests should be served on CPU, with limited GPU use reserved for offline training only.

Design Real-Time Global Inventory Predictor

Hard

ML System Design

Product Context

Scale

Signal	Value
DAU	85M
Peak product-page QPS	420K
Peak checkout QPS	55K
Active SKUs	120M
Warehouses / stores / sellers	210K nodes globally
Inventory mutation events	180M/day
Latency budget for availability decision	80ms p99

Task

Design an end-to-end ML system that predicts real-time sellable inventory and supports downstream ranking of fulfillment options. Your design should address:

Requirements and scope: What exact prediction is made (e.g., in-stock probability, units available, oversell risk), who consumes it, and what SLAs matter most.
System architecture: A multi-stage online path from candidate inventory sources to ranking and final re-ranking/policy checks, plus the offline training and feature pipelines.
Modeling choices: What models you would use for candidate retrieval, ranking, and final decisioning; how you would combine ML with hard business rules such as reserved stock, compliance holds, and seller-specific constraints.
Serving design: Online vs batch features, feature store design, cache strategy, latency budget allocation, and regional deployment for global traffic.
Evaluation: Offline metrics, online experimentation or shadow testing, and how to measure business impact such as reduced oversells, improved conversion, and fulfillment reliability.
Failure modes: How you would detect and mitigate feature drift, training-serving skew, stale events, hot SKUs, and regional outages.

Constraints

Inventory freshness target is under 5 seconds for first-party warehouses and under 60 seconds for third-party sellers.
Overselling high-demand items is very costly; false positives are worse than false negatives for checkout.
Some regions require data residency and cannot share raw user-level data across borders.
The system must continue serving degraded but safe answers during stream lag, model outages, or warehouse feed disruptions.
Cost matters: most requests should be served on CPU, with limited GPU use reserved for offline training only.