Product Context
Design the machine learning system that predicts whether a Facebook user will click on an ad shown in Feed, Stories, or Reels. The prediction is used inside Meta’s ads delivery stack to help rank eligible ads for each impression while balancing relevance, advertiser value, and strict latency constraints.
Scale
| Signal | Value |
|---|
| DAU | 2.2B Facebook/Instagram users exposed to ads |
| Peak ad impression opportunities | 12M requests/sec globally |
| Eligible ads per auction before pruning | 50K-200K |
| Ads active in last 30 days | 150M creatives/campaigns |
| New/updated ads per day | 8M |
| End-to-end p99 latency budget | 120ms |
| Training events/day | 60B impressions, 900M clicks |
Task
Design an end-to-end ML system for click-through-rate prediction in Meta ads ranking. Address the following:
- Clarify the prediction target, product goals, and how CTR fits into the overall ads auction and ranking objective.
- Propose a multi-stage architecture from candidate retrieval/pruning to ranking and optional re-ranking, including what runs online vs batch.
- Define the data pipeline: labels, delayed feedback handling, feature engineering, training cadence, and how you avoid training-serving skew.
- Choose models for each stage and justify tradeoffs across quality, latency, interpretability, and serving cost.
- Explain offline evaluation, online experimentation, calibration, and business guardrails.
- Identify major failure modes at Meta scale, including feature drift, stale features, cold start, and infrastructure degradation.
Constraints
- The system must support ads from new advertisers and new creatives with little or no click history.
- User features should be fresh within minutes; campaign budget and pacing signals may update sub-minute.
- Raw creative understanding models can run offline, but not synchronously in the serving path for every request.
- The system must be highly available and degrade gracefully to simpler models if feature services or deep rankers are slow.
- Predictions must be calibrated because downstream auction logic consumes probabilities, not just relative scores.