Product Context
ShopNow is a large e-commerce marketplace that runs app push, email, and SMS notifications for limited-time flash sales. The goal is to notify the right users about the right sale quickly, without overloading downstream delivery systems or spamming users.
Scale
| Signal | Value |
|---|
| DAU | 45M |
| Monthly active shoppers | 120M |
| Peak flash-sale events/day | 35K |
| Peak candidate users for a major sale | 18M |
| Notification decision QPS during event spikes | 220K |
| End-to-end decision latency budget (p99) | 250ms |
| Active product catalog | 80M SKUs |
| User feature freshness target | < 5 minutes |
Task
Design an end-to-end ML system for flash-sale notification targeting and fanout. Assume each sale event has a short lifetime (5-60 minutes), inventory may be limited, and users can receive notifications through multiple channels.
Address the following:
- Clarify product goals, success metrics, and the key functional/non-functional requirements.
- Size the system and propose a multi-stage architecture for audience retrieval, ranking, and final re-ranking/throttling before delivery.
- Choose models for each stage and explain what runs online vs batch vs stream processing.
- Define the data pipeline, labels, feature store strategy, and how you prevent training-serving skew.
- Explain offline and online evaluation, including experimentation and guardrails against user fatigue and oversell.
- Identify major failure modes and how you would detect and mitigate them.
Constraints
- A sale can go viral unexpectedly; traffic may jump 10x within 2 minutes.
- Inventory is limited, so over-notifying can create poor user experience and wasted delivery cost.
- Push delivery is cheap, SMS is expensive, and some regions require explicit consent and quiet-hour compliance.
- User intent shifts quickly during promotions, so stale features can materially hurt relevance.
- The system must degrade safely if ranking services or feature stores are unavailable; sending no notification is preferable to sending clearly bad or non-compliant notifications.