Product Context
Meta wants to detect and act on hateful text comments across Facebook and Instagram. The system should score comments in real time for enforcement and also support downstream human review, user reporting, and policy analytics.
Scale
| Signal | Value |
|---|
| DAU impacted | 1.8B users across Facebook + Instagram |
| New text comments/day | 9B |
| Peak comment creation QPS | 220K |
| Peak comment-view QPS needing precomputed safety scores | 1.2M |
| Supported languages | 120+ |
| p99 latency budget for write-time decision | 120ms end-to-end |
| Human review queue capacity | ~8M comments/day |
Task
Design an end-to-end ML system for hate-speech detection on text comments. Your design should address:
- How you define the prediction target, policy tiers, and product actions (allow, downrank, send to review, remove).
- The full architecture from data collection and labeling to online serving, including a multi-stage pipeline rather than a single model.
- Model choices for fast filtering, ranking, and high-precision re-scoring under strict latency and cost constraints.
- Offline and online evaluation, including how you handle delayed labels, policy changes, and multilingual performance.
- Monitoring, failure modes, and rollback plans, especially for feature drift, training-serving skew, adversarial evasion, and fairness across languages/dialects.
Constraints
- False positives are costly: incorrect removals harm user trust and creator experience.
- False negatives are also costly: missed hate speech creates safety and regulatory risk.
- Some actions must happen synchronously at comment creation; others can be asynchronous within minutes.
- Labels are noisy and partially delayed: user reports, reviewer decisions, and appeals may arrive hours or days later.
- The system must support policy updates without requiring a full retrain for every threshold change.
- Raw text retention is limited in some regions, so feature logging and compliance constraints matter.