Design Sharded Message Recommendation Serving

Product Context

Attentive AI Pro helps marketers generate and personalize SMS and email campaigns. Design the ML system that recommends the next best message variant, offer, or audience treatment for each consumer interaction, while supporting very high write rates from engagement events and campaign updates.

Scale

Signal	Value
Brands onboarded	8,000
Consumer profiles	250M
Peak inbound engagement events	1.2M writes/sec during major retail moments
Peak recommendation QPS	180K requests/sec
Active message / offer catalog	40M active variants
New/updated campaign entities	25M/day
End-to-end p99 latency budget	120ms

Task

Clarify product requirements and success metrics for recommending content in Attentive AI Pro.
Design the end-to-end ML architecture, including retrieval, ranking, and any re-ranking or policy layer.
Explain how you would shard and replicate the high-write online data plane: user features, campaign state, counters, embeddings, and feedback logs.
Define the offline and online pipelines, including feature computation, training cadence, and how to avoid training-serving skew.
Propose evaluation, monitoring, and rollback strategies for model, data, and infrastructure failures.

Constraints

User interaction features must be fresh within 1-2 minutes for triggered messaging use cases.
Some campaign-level features are updated extremely frequently during sends, creating hot partitions.
The system must isolate brands for privacy/compliance while still sharing learnings where allowed.
Cost matters: most ranking traffic should run on CPU; GPU use should be limited to offline training or narrow online stages.
If online feature reads degrade, the system must still return safe recommendations rather than block message delivery.

Signal

Value

Brands onboarded

8,000

Consumer profiles

250M

Peak inbound engagement events

1.2M writes/sec during major retail moments

Peak recommendation QPS

180K requests/sec

Active message / offer catalog

40M active variants

New/updated campaign entities

25M/day

End-to-end p99 latency budget

120ms

Task

Clarify product requirements and success metrics for recommending content in Attentive AI Pro.

Design the end-to-end ML architecture, including retrieval, ranking, and any re-ranking or policy layer.

Explain how you would shard and replicate the high-write online data plane: user features, campaign state, counters, embeddings, and feedback logs.

Define the offline and online pipelines, including feature computation, training cadence, and how to avoid training-serving skew.

Propose evaluation, monitoring, and rollback strategies for model, data, and infrastructure failures.

Constraints

User interaction features must be fresh within 1-2 minutes for triggered messaging use cases.

Some campaign-level features are updated extremely frequently during sends, creating hot partitions.

The system must isolate brands for privacy/compliance while still sharing learnings where allowed.

Cost matters: most ranking traffic should run on CPU; GPU use should be limited to offline training or narrow online stages.

If online feature reads degrade, the system must still return safe recommendations rather than block message delivery.

Signal

Value

Brands onboarded

8,000

Consumer profiles

250M

Peak inbound engagement events

1.2M writes/sec during major retail moments

Peak recommendation QPS

180K requests/sec

Active message / offer catalog

40M active variants

New/updated campaign entities

25M/day

End-to-end p99 latency budget

120ms

Task

Clarify product requirements and success metrics for recommending content in Attentive AI Pro.

Design the end-to-end ML architecture, including retrieval, ranking, and any re-ranking or policy layer.

Explain how you would shard and replicate the high-write online data plane: user features, campaign state, counters, embeddings, and feedback logs.

Define the offline and online pipelines, including feature computation, training cadence, and how to avoid training-serving skew.

Propose evaluation, monitoring, and rollback strategies for model, data, and infrastructure failures.

Constraints

User interaction features must be fresh within 1-2 minutes for triggered messaging use cases.

Some campaign-level features are updated extremely frequently during sends, creating hot partitions.

The system must isolate brands for privacy/compliance while still sharing learnings where allowed.

Cost matters: most ranking traffic should run on CPU; GPU use should be limited to offline training or narrow online stages.

If online feature reads degrade, the system must still return safe recommendations rather than block message delivery.

Signal

Value

Brands onboarded

8,000

Consumer profiles

250M

Peak inbound engagement events

1.2M writes/sec during major retail moments

Peak recommendation QPS

180K requests/sec

Active message / offer catalog

40M active variants

New/updated campaign entities

25M/day

End-to-end p99 latency budget

120ms

Task

Clarify product requirements and success metrics for recommending content in Attentive AI Pro.

Design the end-to-end ML architecture, including retrieval, ranking, and any re-ranking or policy layer.

Explain how you would shard and replicate the high-write online data plane: user features, campaign state, counters, embeddings, and feedback logs.

Define the offline and online pipelines, including feature computation, training cadence, and how to avoid training-serving skew.

Propose evaluation, monitoring, and rollback strategies for model, data, and infrastructure failures.

Constraints

User interaction features must be fresh within 1-2 minutes for triggered messaging use cases.

Some campaign-level features are updated extremely frequently during sends, creating hot partitions.

The system must isolate brands for privacy/compliance while still sharing learnings where allowed.

Cost matters: most ranking traffic should run on CPU; GPU use should be limited to offline training or narrow online stages.

If online feature reads degrade, the system must still return safe recommendations rather than block message delivery.

Interview Guides

Product Context

Scale

Task

Constraints

Design Sharded Message Recommendation Serving

Product Context

Scale

Task

Constraints

Your Answer

Design Sharded Message Recommendation Serving

Product Context

Scale

Task

Constraints

Design Sharded Message Recommendation Serving

Product Context

Scale

Task

Constraints

Your Answer