Design Sharded Ad Click Predictor

Product Context

AdNova serves personalized sponsored products across a large e-commerce app. You need to design the ML system that predicts click-through and conversion for ad ranking, while the underlying event and feature infrastructure is high-write: bids change frequently, user interactions stream in continuously, and model features must stay fresh.

Scale

Signal	Value
DAU	90M
Peak ad-request QPS	220K
Peak write QPS to event/feature systems	3.5M
Active ad catalog	45M ads
Advertiser bid/budget updates	400M/day
User interaction events	2.2B/day
p99 serving latency budget	120ms end-to-end

Task

Design an end-to-end ML ranking system for ad serving, with special attention to how sharding and replication choices affect a high-write ML platform.

Clarify product goals and define functional and non-functional requirements.
Propose the online and offline architecture, including retrieval, ranking, and optional re-ranking.
Design storage, sharding, and replication for high-write components such as event logs, online feature store, counters, and model-serving metadata.
Explain how you would keep training and serving features consistent despite delayed writes, replication lag, and partial failures.
Define offline and online evaluation, plus monitoring for drift, skew, freshness, and system health.
Identify major failure modes and mitigations, especially around hot keys, stale replicas, and write amplification.

Constraints

Ads with budget exhaustion or policy violations must stop serving within 1 minute.
User features should be fresh within 2 minutes; item/ad features within 5 minutes.
The serving path cannot synchronously depend on a cross-region write.
Cost matters: GPU usage should be limited to the heaviest ranking stage only.
Must support regional data residency for EU users, so some training data cannot leave region.
The system should remain available during shard rebalancing and replica loss.

Product Context

Scale

Signal	Value
DAU	90M
Peak ad-request QPS	220K
Peak write QPS to event/feature systems	3.5M
Active ad catalog	45M ads
Advertiser bid/budget updates	400M/day
User interaction events	2.2B/day
p99 serving latency budget	120ms end-to-end

Task

Design an end-to-end ML ranking system for ad serving, with special attention to how sharding and replication choices affect a high-write ML platform.

Clarify product goals and define functional and non-functional requirements.
Propose the online and offline architecture, including retrieval, ranking, and optional re-ranking.
Design storage, sharding, and replication for high-write components such as event logs, online feature store, counters, and model-serving metadata.
Explain how you would keep training and serving features consistent despite delayed writes, replication lag, and partial failures.
Define offline and online evaluation, plus monitoring for drift, skew, freshness, and system health.
Identify major failure modes and mitigations, especially around hot keys, stale replicas, and write amplification.

Constraints

Ads with budget exhaustion or policy violations must stop serving within 1 minute.
User features should be fresh within 2 minutes; item/ad features within 5 minutes.
The serving path cannot synchronously depend on a cross-region write.
Cost matters: GPU usage should be limited to the heaviest ranking stage only.
Must support regional data residency for EU users, so some training data cannot leave region.
The system should remain available during shard rebalancing and replica loss.

Product Context

Scale

Signal	Value
DAU	90M
Peak ad-request QPS	220K
Peak write QPS to event/feature systems	3.5M
Active ad catalog	45M ads
Advertiser bid/budget updates	400M/day
User interaction events	2.2B/day
p99 serving latency budget	120ms end-to-end

Task

Design an end-to-end ML ranking system for ad serving, with special attention to how sharding and replication choices affect a high-write ML platform.

Clarify product goals and define functional and non-functional requirements.
Propose the online and offline architecture, including retrieval, ranking, and optional re-ranking.
Design storage, sharding, and replication for high-write components such as event logs, online feature store, counters, and model-serving metadata.
Explain how you would keep training and serving features consistent despite delayed writes, replication lag, and partial failures.
Define offline and online evaluation, plus monitoring for drift, skew, freshness, and system health.
Identify major failure modes and mitigations, especially around hot keys, stale replicas, and write amplification.

Constraints

Ads with budget exhaustion or policy violations must stop serving within 1 minute.
User features should be fresh within 2 minutes; item/ad features within 5 minutes.
The serving path cannot synchronously depend on a cross-region write.
Cost matters: GPU usage should be limited to the heaviest ranking stage only.
Must support regional data residency for EU users, so some training data cannot leave region.
The system should remain available during shard rebalancing and replica loss.

Product Context

Scale

Signal	Value
DAU	90M
Peak ad-request QPS	220K
Peak write QPS to event/feature systems	3.5M
Active ad catalog	45M ads
Advertiser bid/budget updates	400M/day
User interaction events	2.2B/day
p99 serving latency budget	120ms end-to-end

Task

Design an end-to-end ML ranking system for ad serving, with special attention to how sharding and replication choices affect a high-write ML platform.

Clarify product goals and define functional and non-functional requirements.
Propose the online and offline architecture, including retrieval, ranking, and optional re-ranking.
Design storage, sharding, and replication for high-write components such as event logs, online feature store, counters, and model-serving metadata.
Explain how you would keep training and serving features consistent despite delayed writes, replication lag, and partial failures.
Define offline and online evaluation, plus monitoring for drift, skew, freshness, and system health.
Identify major failure modes and mitigations, especially around hot keys, stale replicas, and write amplification.

Constraints

Ads with budget exhaustion or policy violations must stop serving within 1 minute.
User features should be fresh within 2 minutes; item/ad features within 5 minutes.
The serving path cannot synchronously depend on a cross-region write.
Cost matters: GPU usage should be limited to the heaviest ranking stage only.
Must support regional data residency for EU users, so some training data cannot leave region.
The system should remain available during shard rebalancing and replica loss.

Interview Guides

Product Context

Scale

Task

Constraints

Design Sharded Ad Click Predictor

Product Context

Scale

Task

Constraints

Your Answer

Design Sharded Ad Click Predictor

Product Context

Scale

Task

Constraints

Design Sharded Ad Click Predictor

Product Context

Scale

Task

Constraints

Your Answer