Product Context
ShopNow is a large e-commerce marketplace that runs short flash sales across electronics, fashion, and home goods. The notifications team wants an ML-driven system that decides who to alert, when, and through which channel so a flash sale reaches millions of users without overloading downstream systems or spamming low-intent users.
Scale
| Signal | Value |
|---|
| DAU | 45M |
| Monthly active users opted into notifications | 120M |
| Peak flash-sale campaigns/day | 300 |
| Largest campaign audience | 18M eligible users |
| Peak send rate target | 1.2M notifications/min |
| Candidate users per campaign | 5M-18M |
| Per-user decision latency (online path) | p99 < 120ms |
| User/item feature freshness | < 5 min for user activity, < 15 min for inventory/pricing |
Task
Design an end-to-end ML system for flash-sale notifications. Address the following:
- Clarify the product goal, success metrics, and key constraints, including user fatigue and backend protection.
- Propose a multi-stage architecture for audience selection, ranking, and send-time/channel decisions.
- Define the offline and online data pipelines, labels, and feature store strategy.
- Choose models for each stage and explain the tradeoffs versus simpler heuristics.
- Describe the serving architecture, including rate limiting, batching, fallbacks, and capacity planning.
- Explain how you would evaluate the system offline and online, and how you would detect drift, training-serving skew, and operational failures.
Constraints
- Inventory is limited and can sell out within minutes; stale recommendations are costly.
- Notification providers enforce per-minute throughput caps and occasional regional throttling.
- The system must respect user opt-in status, quiet hours, and compliance rules by country.
- Cost matters: most campaigns should run on CPU-heavy infrastructure, not always-on GPU serving.
- The business wants incremental revenue lift, but guardrails must limit unsubscribe rate, complaint rate, and backend traffic spikes.