Design Real-Time Ad Bidding Ranker

Product Context

AdNova runs a large programmatic advertising exchange. When a user opens a publisher page or app, the exchange must select eligible ads, estimate value, and return a bid decision in real time for advertisers competing in the auction.

Scale

Signal	Value
DAU impacted	120M users across publisher inventory
Peak bid requests	500K QPS
Active ad catalog	40M creatives / campaigns
Eligible candidates per request	5K-50K before filtering
End-to-end latency budget	p99 < 50ms
Daily impression logs	~18B events/day

Task

Design an end-to-end ML system for real-time ad bidding under these constraints. Your design should address:

How to retrieve and filter eligible ads, then score and rank them within the latency budget
What models you would use at each stage (retrieval, ranking, bid optimization / re-ranking) and why
How to split computation between batch and online systems, including feature freshness requirements
How you would train, evaluate, deploy, and monitor the models at scale
What failure modes are most likely in production and how the system should degrade safely

Constraints

The system must sustain 500K QPS globally with multi-region failover
p99 latency must remain under 50ms, including network overhead and feature lookups
User-level features are privacy-constrained: only approved, low-retention signals may be used in some regions
Advertiser budgets, pacing, frequency caps, and policy filters must be enforced online
Conversion labels are delayed and sparse; click labels are faster but noisier proxies
Cost matters: the design should avoid requiring GPU inference on every request unless clearly justified

Assume the exchange receives request context (page, device, coarse location, timestamp), limited user history where allowed, campaign metadata, and real-time budget state. Focus on the ML system design rather than auction theory details, but explain how predicted CTR/CVR/value estimates interact with final bid selection.

Product Context

Scale

Signal	Value
DAU impacted	120M users across publisher inventory
Peak bid requests	500K QPS
Active ad catalog	40M creatives / campaigns
Eligible candidates per request	5K-50K before filtering
End-to-end latency budget	p99 < 50ms
Daily impression logs	~18B events/day

Task

Design an end-to-end ML system for real-time ad bidding under these constraints. Your design should address:

How to retrieve and filter eligible ads, then score and rank them within the latency budget
What models you would use at each stage (retrieval, ranking, bid optimization / re-ranking) and why
How to split computation between batch and online systems, including feature freshness requirements
How you would train, evaluate, deploy, and monitor the models at scale
What failure modes are most likely in production and how the system should degrade safely

Constraints

The system must sustain 500K QPS globally with multi-region failover
p99 latency must remain under 50ms, including network overhead and feature lookups
User-level features are privacy-constrained: only approved, low-retention signals may be used in some regions
Advertiser budgets, pacing, frequency caps, and policy filters must be enforced online
Conversion labels are delayed and sparse; click labels are faster but noisier proxies
Cost matters: the design should avoid requiring GPU inference on every request unless clearly justified

Product Context

Scale

Signal	Value
DAU impacted	120M users across publisher inventory
Peak bid requests	500K QPS
Active ad catalog	40M creatives / campaigns
Eligible candidates per request	5K-50K before filtering
End-to-end latency budget	p99 < 50ms
Daily impression logs	~18B events/day

Task

Design an end-to-end ML system for real-time ad bidding under these constraints. Your design should address:

How to retrieve and filter eligible ads, then score and rank them within the latency budget
What models you would use at each stage (retrieval, ranking, bid optimization / re-ranking) and why
How to split computation between batch and online systems, including feature freshness requirements
How you would train, evaluate, deploy, and monitor the models at scale
What failure modes are most likely in production and how the system should degrade safely

Constraints

The system must sustain 500K QPS globally with multi-region failover
p99 latency must remain under 50ms, including network overhead and feature lookups
User-level features are privacy-constrained: only approved, low-retention signals may be used in some regions
Advertiser budgets, pacing, frequency caps, and policy filters must be enforced online
Conversion labels are delayed and sparse; click labels are faster but noisier proxies
Cost matters: the design should avoid requiring GPU inference on every request unless clearly justified

Product Context

Scale

Signal	Value
DAU impacted	120M users across publisher inventory
Peak bid requests	500K QPS
Active ad catalog	40M creatives / campaigns
Eligible candidates per request	5K-50K before filtering
End-to-end latency budget	p99 < 50ms
Daily impression logs	~18B events/day

Task

Design an end-to-end ML system for real-time ad bidding under these constraints. Your design should address:

How to retrieve and filter eligible ads, then score and rank them within the latency budget
What models you would use at each stage (retrieval, ranking, bid optimization / re-ranking) and why
How to split computation between batch and online systems, including feature freshness requirements
How you would train, evaluate, deploy, and monitor the models at scale
What failure modes are most likely in production and how the system should degrade safely

Constraints

The system must sustain 500K QPS globally with multi-region failover
p99 latency must remain under 50ms, including network overhead and feature lookups
User-level features are privacy-constrained: only approved, low-retention signals may be used in some regions
Advertiser budgets, pacing, frequency caps, and policy filters must be enforced online
Conversion labels are delayed and sparse; click labels are faster but noisier proxies
Cost matters: the design should avoid requiring GPU inference on every request unless clearly justified

Interview Guides

Product Context

Scale

Task

Constraints

Design Real-Time Ad Bidding Ranker

Product Context

Scale

Task

Constraints

Your Answer

Design Real-Time Ad Bidding Ranker

Product Context

Scale

Task

Constraints

Design Real-Time Ad Bidding Ranker

Product Context

Scale

Task

Constraints

Your Answer