Product Context
Design the ML system behind ranking listings in Facebook Marketplace search and browse surfaces. Buyers expect relevant, fresh, and trustworthy listings, while sellers expect new inventory to become discoverable quickly.
Scale
| Signal | Value |
|---|
| DAU touching Marketplace | 120M |
| Peak query/feed QPS | 900K |
| Active listings | 350M |
| New or updated listings/day | 18M |
| Candidate pool before ranking | ~20K listings/request |
| Final results returned | Top 50 |
| End-to-end p99 latency budget | 180ms |
Task
Design an end-to-end ML system that uses a scalable serving architecture with load balancing, caching, database sharding, and consistency-aware data design, while still following a standard ML retrieval → ranking → re-ranking flow.
Address the following:
- Define the functional and non-functional requirements, including freshness, latency, and relevance goals.
- Propose the full architecture: offline training, online serving, feature storage, candidate retrieval, ranking, re-ranking, and feedback logging.
- Explain how you would use caching, load balancing, and sharded storage for user features, listing features, embeddings, and interaction logs at Meta scale.
- Discuss consistency choices across systems: for example, when eventual consistency is acceptable versus when stronger consistency is needed for listing availability, price changes, or policy removals.
- Define offline and online evaluation, including how you would detect feature drift, training-serving skew, and regressions in freshness or trust signals.
- Identify major failure modes and mitigation plans, especially around stale cache entries, hot shards, feature store outages, and delayed model updates.
Constraints
- Fresh listings should become retrievable within 5 minutes of creation.
- Removed, sold, or policy-violating listings must stop being served within 1 minute.
- Serving should prefer Meta-native infrastructure assumptions such as TAO, Memcache, Kafka/PubSub-style log streams, and regionally distributed inference services.
- Cost matters: the design should avoid expensive per-request deep models on the full candidate set.
- The system must support graceful degradation during partial outages without returning unsafe or obviously stale results.