Design Retail Demand Forecasting Platform

Product Context

MercatoAI sells demand forecasting software to large retail chains. Your team wants to present a new forecasting system to a client, but the proposed model is much more accurate than the current baseline and also far more expensive to train and serve.

Scale

Signal	Value
Stores	4,500
Active SKUs	1.2M
Store-SKU series	~180M
Daily forecasts generated	~5.4B horizon points (30-day horizon)
Peak client API QPS	8K forecast queries/sec
Batch planning jobs	Nightly, must finish in 3 hours
Interactive latency budget	p99 < 300ms per API request

The client uses forecasts for replenishment, pricing, and labor planning. Some workflows are offline and can tolerate batch latency, while others require near-real-time forecast access for planners and downstream systems. Historical data includes 3 years of daily sales, promotions, holidays, stockouts, returns, and limited competitor pricing. New SKUs and new stores are common.

Task

Design an end-to-end ML system and explain how you would decide whether this expensive model is ready to present to the client.

Clarify the product requirements and which use cases need batch vs online forecasts.
Propose a multi-stage forecasting architecture, including when to use a cheap baseline, a medium-cost model, and the expensive model.
Design the training, feature, and serving pipelines, including how forecasts are materialized, cached, and refreshed.
Define the evaluation framework: offline backtesting, segment-level analysis, and online/client-facing validation.
Explain how you would reason about cost vs accuracy vs latency before recommending the model to the client.
Identify major failure modes, especially feature drift, training-serving skew, sparse data, and cold-start entities.

Constraints

The client will not accept a solution that increases cloud spend by more than 2x unless forecast error improves materially on high-value SKUs.
Promotions and stockouts cause sharp non-stationarity.
Some features, such as future promotions, are available in batch but may be missing or delayed online.
Forecasts must be explainable enough for planners to trust overrides.
The system must support fallback forecasts if the expensive model misses SLA or input features are unavailable.

Product Context

Scale

Signal	Value
Stores	4,500
Active SKUs	1.2M
Store-SKU series	~180M
Daily forecasts generated	~5.4B horizon points (30-day horizon)
Peak client API QPS	8K forecast queries/sec
Batch planning jobs	Nightly, must finish in 3 hours
Interactive latency budget	p99 < 300ms per API request

Task

Design an end-to-end ML system and explain how you would decide whether this expensive model is ready to present to the client.

Clarify the product requirements and which use cases need batch vs online forecasts.
Propose a multi-stage forecasting architecture, including when to use a cheap baseline, a medium-cost model, and the expensive model.
Design the training, feature, and serving pipelines, including how forecasts are materialized, cached, and refreshed.
Define the evaluation framework: offline backtesting, segment-level analysis, and online/client-facing validation.
Explain how you would reason about cost vs accuracy vs latency before recommending the model to the client.
Identify major failure modes, especially feature drift, training-serving skew, sparse data, and cold-start entities.

Constraints

The client will not accept a solution that increases cloud spend by more than 2x unless forecast error improves materially on high-value SKUs.
Promotions and stockouts cause sharp non-stationarity.
Some features, such as future promotions, are available in batch but may be missing or delayed online.
Forecasts must be explainable enough for planners to trust overrides.
The system must support fallback forecasts if the expensive model misses SLA or input features are unavailable.

Product Context

Scale

Signal	Value
Stores	4,500
Active SKUs	1.2M
Store-SKU series	~180M
Daily forecasts generated	~5.4B horizon points (30-day horizon)
Peak client API QPS	8K forecast queries/sec
Batch planning jobs	Nightly, must finish in 3 hours
Interactive latency budget	p99 < 300ms per API request

Task

Design an end-to-end ML system and explain how you would decide whether this expensive model is ready to present to the client.

Clarify the product requirements and which use cases need batch vs online forecasts.
Propose a multi-stage forecasting architecture, including when to use a cheap baseline, a medium-cost model, and the expensive model.
Design the training, feature, and serving pipelines, including how forecasts are materialized, cached, and refreshed.
Define the evaluation framework: offline backtesting, segment-level analysis, and online/client-facing validation.
Explain how you would reason about cost vs accuracy vs latency before recommending the model to the client.
Identify major failure modes, especially feature drift, training-serving skew, sparse data, and cold-start entities.

Constraints

The client will not accept a solution that increases cloud spend by more than 2x unless forecast error improves materially on high-value SKUs.
Promotions and stockouts cause sharp non-stationarity.
Some features, such as future promotions, are available in batch but may be missing or delayed online.
Forecasts must be explainable enough for planners to trust overrides.
The system must support fallback forecasts if the expensive model misses SLA or input features are unavailable.

Product Context

Scale

Signal	Value
Stores	4,500
Active SKUs	1.2M
Store-SKU series	~180M
Daily forecasts generated	~5.4B horizon points (30-day horizon)
Peak client API QPS	8K forecast queries/sec
Batch planning jobs	Nightly, must finish in 3 hours
Interactive latency budget	p99 < 300ms per API request

Task

Design an end-to-end ML system and explain how you would decide whether this expensive model is ready to present to the client.

Clarify the product requirements and which use cases need batch vs online forecasts.
Propose a multi-stage forecasting architecture, including when to use a cheap baseline, a medium-cost model, and the expensive model.
Design the training, feature, and serving pipelines, including how forecasts are materialized, cached, and refreshed.
Define the evaluation framework: offline backtesting, segment-level analysis, and online/client-facing validation.
Explain how you would reason about cost vs accuracy vs latency before recommending the model to the client.
Identify major failure modes, especially feature drift, training-serving skew, sparse data, and cold-start entities.

Constraints

The client will not accept a solution that increases cloud spend by more than 2x unless forecast error improves materially on high-value SKUs.
Promotions and stockouts cause sharp non-stationarity.
Some features, such as future promotions, are available in batch but may be missing or delayed online.
Forecasts must be explainable enough for planners to trust overrides.
The system must support fallback forecasts if the expensive model misses SLA or input features are unavailable.

Interview Guides

Product Context

Scale

Task

Constraints

Design Retail Demand Forecasting Platform

Product Context

Scale

Task

Constraints

Your Answer

Design Retail Demand Forecasting Platform

Product Context

Scale

Task

Constraints

Design Retail Demand Forecasting Platform

Product Context

Scale

Task

Constraints

Your Answer