You are designing a machine learning system to support a university research workflow across surfaces such as library search, grant discovery, expert finding, and internal knowledge access. Researchers, staff, and students should receive personalized, context-aware recommendations for papers, datasets, funding opportunities, collaborators, and university resources as they move through a research task. The goal is to reduce time spent searching across fragmented systems and improve discovery of relevant, timely resources. The system must work across both anonymous exploratory sessions and authenticated users with prior activity.
| Signal | Value |
|---|---|
| Monthly active users | 180K |
| Peak QPS | 450 |
| Searchable resource catalog | 45M items |
| New or updated items/day | 1.2M |
| Per-request latency budget (p99) | 300ms |
| Top results returned | 20 |
How would you design this end-to-end system so it can retrieve, rank, and re-rank research resources in real time while staying fresh as new content arrives? Explain the architecture, model choices, serving strategy, evaluation plan, and how you would handle drift, skew, and operational failures.