Product Context
MercatoAI sells demand forecasting software to large retail chains. Your team wants to present a new forecasting system to a client, but the proposed model is much more accurate than the current baseline and also far more expensive to train and serve.
Scale
| Signal | Value |
|---|
| Stores | 4,500 |
| Active SKUs | 1.2M |
| Store-SKU series | ~180M |
| Daily forecasts generated | ~5.4B horizon points (30-day horizon) |
| Peak client API QPS | 8K forecast queries/sec |
| Batch planning jobs | Nightly, must finish in 3 hours |
| Interactive latency budget | p99 < 300ms per API request |
The client uses forecasts for replenishment, pricing, and labor planning. Some workflows are offline and can tolerate batch latency, while others require near-real-time forecast access for planners and downstream systems. Historical data includes 3 years of daily sales, promotions, holidays, stockouts, returns, and limited competitor pricing. New SKUs and new stores are common.
Task
Design an end-to-end ML system and explain how you would decide whether this expensive model is ready to present to the client.
- Clarify the product requirements and which use cases need batch vs online forecasts.
- Propose a multi-stage forecasting architecture, including when to use a cheap baseline, a medium-cost model, and the expensive model.
- Design the training, feature, and serving pipelines, including how forecasts are materialized, cached, and refreshed.
- Define the evaluation framework: offline backtesting, segment-level analysis, and online/client-facing validation.
- Explain how you would reason about cost vs accuracy vs latency before recommending the model to the client.
- Identify major failure modes, especially feature drift, training-serving skew, sparse data, and cold-start entities.
Constraints
- The client will not accept a solution that increases cloud spend by more than 2x unless forecast error improves materially on high-value SKUs.
- Promotions and stockouts cause sharp non-stationarity.
- Some features, such as future promotions, are available in batch but may be missing or delayed online.
- Forecasts must be explainable enough for planners to trust overrides.
- The system must support fallback forecasts if the expensive model misses SLA or input features are unavailable.