Product Context
AdNova runs a large programmatic advertising exchange. When a user opens a publisher page or app, the exchange must select eligible ads, estimate value, and return a bid decision in real time for advertisers competing in the auction.
Scale
| Signal | Value |
|---|
| DAU impacted | 120M users across publisher inventory |
| Peak bid requests | 500K QPS |
| Active ad catalog | 40M creatives / campaigns |
| Eligible candidates per request | 5K-50K before filtering |
| End-to-end latency budget | p99 < 50ms |
| Daily impression logs | ~18B events/day |
Task
Design an end-to-end ML system for real-time ad bidding under these constraints. Your design should address:
- How to retrieve and filter eligible ads, then score and rank them within the latency budget
- What models you would use at each stage (retrieval, ranking, bid optimization / re-ranking) and why
- How to split computation between batch and online systems, including feature freshness requirements
- How you would train, evaluate, deploy, and monitor the models at scale
- What failure modes are most likely in production and how the system should degrade safely
Constraints
- The system must sustain 500K QPS globally with multi-region failover
- p99 latency must remain under 50ms, including network overhead and feature lookups
- User-level features are privacy-constrained: only approved, low-retention signals may be used in some regions
- Advertiser budgets, pacing, frequency caps, and policy filters must be enforced online
- Conversion labels are delayed and sparse; click labels are faster but noisier proxies
- Cost matters: the design should avoid requiring GPU inference on every request unless clearly justified
Assume the exchange receives request context (page, device, coarse location, timestamp), limited user history where allowed, campaign metadata, and real-time budget state. Focus on the ML system design rather than auction theory details, but explain how predicted CTR/CVR/value estimates interact with final bid selection.