Product Context
AdNova is a self-serve ads platform used by mid-market and enterprise advertisers to target users for email, push, and in-app campaigns. You are designing the ML system that predicts which users should be included in a campaign audience, with feature drift treated as a first-class design concern.
Scale
| Signal | Value |
|---|
| Monthly active users eligible for targeting | 120M |
| Daily active users | 35M |
| Active advertisers | 180K |
| Campaign launches/day | 1.2M |
| Peak audience prediction QPS | 220K |
| User-feature updates/day | 9B |
| Feature dimensions after joins | ~2,500 |
| End-to-end online scoring p99 | 120ms |
Campaigns vary from broad re-engagement blasts to narrow high-value segments. User behavior, campaign mix, seasonality, privacy constraints, and upstream schema changes frequently shift feature distributions.
Task
- Define the functional and non-functional requirements for a campaign targeting system that supports both batch audience generation and low-latency online scoring.
- Propose an end-to-end architecture, including data ingestion, feature computation, training, model serving, and a monitoring loop for feature drift, label drift, and training-serving skew.
- Design a multi-stage decision system where appropriate (for example: candidate retrieval/filtering → scoring/ranking → policy re-ranking or thresholding), and justify where batch vs online inference should be used.
- Explain how you would build the feature store and feature pipelines so that the same features are used consistently in training and serving.
- Define offline and online evaluation, including how you would detect when drift is hurting campaign performance before advertisers notice.
- Identify major failure modes and mitigations, especially around stale features, delayed labels, schema changes, privacy deletions, and cost/latency regressions.
Constraints
- Advertisers expect fresh audiences within 15 minutes of major user behavior changes.
- Some features are only available in batch; others arrive via streaming events.
- Privacy rules require user deletion requests to propagate within 24 hours.
- Cost matters: average scoring cost must stay below $0.001 per 1,000 user-campaign evaluations.
- The system must remain available during partial feature pipeline outages, even if it must degrade gracefully.