Product Context
AtlasRec provides a recommendations platform for enterprise customers. Each tenant is a separate retailer or media app that wants personalized recommendations on its website and mobile app, but tenant data cannot leak across customers and some tenants require all data storage, training, and serving to stay within a specific region.
Scale
| Signal | Value |
|---|
| Tenants | 1,200 total; 150 large, 1,050 SMB |
| Total DAU | 90M across all tenants |
| Peak recommendation QPS | 220K global |
| Largest tenant peak QPS | 18K |
| Catalog size | 600M items total; largest tenant 80M |
| New interaction events/day | 9B |
| Per-request latency budget (p99) | 180ms |
| Residency regions | US, EU, India, Singapore |
Task
Design an end-to-end multi-tenant recommendations service that satisfies strict privacy and data residency constraints. Address the following:
- Clarify product requirements, tenant isolation guarantees, and what level of cross-tenant sharing is allowed, if any.
- Propose a multi-stage recommendation architecture (retrieval → ranking → re-ranking) that works for both large tenants with rich data and small tenants with sparse data.
- Design the offline and online data/feature architecture, including how training, feature computation, and model serving respect regional residency rules.
- Explain how you would handle cold start, model updates, and feature freshness without introducing training-serving skew.
- Define offline and online evaluation, including tenant-level metrics, privacy/compliance guardrails, and rollout strategy.
- Identify major failure modes such as data leakage, stale regional models, feature drift, and regional outages, with detection and mitigation.
Constraints
- No raw user-level data may move across tenant boundaries.
- Some tenants prohibit even model parameter sharing across regions; assume the strictest tenants require fully region-local training and serving.
- p99 latency must stay under 180ms, including policy filtering.
- Cost matters: the platform must support many small tenants without dedicating a full GPU fleet per tenant.
- Auditable compliance is required: every feature, model, and request path must be attributable to a tenant and region.