Product Context
AdNova serves personalized sponsored products across a large e-commerce app. You need to design the ML system that predicts click-through and conversion for ad ranking, while the underlying event and feature infrastructure is high-write: bids change frequently, user interactions stream in continuously, and model features must stay fresh.
Scale
| Signal | Value |
|---|
| DAU | 90M |
| Peak ad-request QPS | 220K |
| Peak write QPS to event/feature systems | 3.5M |
| Active ad catalog | 45M ads |
| Advertiser bid/budget updates | 400M/day |
| User interaction events | 2.2B/day |
| p99 serving latency budget | 120ms end-to-end |
Task
Design an end-to-end ML ranking system for ad serving, with special attention to how sharding and replication choices affect a high-write ML platform.
- Clarify product goals and define functional and non-functional requirements.
- Propose the online and offline architecture, including retrieval, ranking, and optional re-ranking.
- Design storage, sharding, and replication for high-write components such as event logs, online feature store, counters, and model-serving metadata.
- Explain how you would keep training and serving features consistent despite delayed writes, replication lag, and partial failures.
- Define offline and online evaluation, plus monitoring for drift, skew, freshness, and system health.
- Identify major failure modes and mitigations, especially around hot keys, stale replicas, and write amplification.
Constraints
- Ads with budget exhaustion or policy violations must stop serving within 1 minute.
- User features should be fresh within 2 minutes; item/ad features within 5 minutes.
- The serving path cannot synchronously depend on a cross-region write.
- Cost matters: GPU usage should be limited to the heaviest ranking stage only.
- Must support regional data residency for EU users, so some training data cannot leave region.
- The system should remain available during shard rebalancing and replica loss.