Product Context
ShopPulse is a large commerce platform that sends promotional, transactional, and reminder SMS notifications. The business wants an ML-driven system that decides who to message, when within a fixed delivery window, and in what priority order so millions of users can be reached efficiently without hurting user experience or carrier deliverability.
Scale
| Signal | Value |
|---|
| DAU | 45M |
| SMS-eligible users | 18M |
| Daily notification campaigns | 2,500 |
| Peak campaign audience | 12M users |
| Peak send rate | 220K SMS/sec across providers |
| Historical events/day | 9B user events |
| End-to-end decision latency budget | 150ms per online eligibility request |
| Delivery window | 30 minutes to 6 hours |
Task
Design an end-to-end ML system for high-volume SMS notifications. Your design should address:
- How to frame the problem and define the prediction tasks for candidate selection, ranking, and send-time optimization.
- The end-to-end architecture, including offline training, online/batch serving, feature storage, and feedback logging.
- A multi-stage decision system (retrieval/eligibility → ranking/prioritization → re-ranking or policy layer) that can handle millions of recipients inside a strict time window.
- Model choices for each stage, with tradeoffs around latency, freshness, interpretability, and operational complexity.
- Offline and online evaluation, including business metrics, user-experience guardrails, and experimentation strategy.
- Failure modes at scale: feature drift, training-serving skew, provider outages, duplicate sends, and miscalibrated models.
Constraints
- Must respect user consent, quiet hours, country-specific SMS regulations, and per-user frequency caps.
- Some campaigns are hard-deadline notifications; others are promotional and can be throttled or skipped.
- Delivery providers have variable throughput, cost, and failure rates by geography.
- Real-time user activity is available within ~2 minutes; some conversion labels arrive with 24-72 hour delay.
- The system should minimize unnecessary sends because SMS cost is material and over-messaging increases unsubscribe risk.