You are building an AI voice platform with personalization across discovery, voice selection, and content recommendations. Some predictions must react to fresh user behavior, while others can be precomputed and served cheaply.
How do you design online versus batch serving for an AI product?
Choosing between batch precomputation and online inferenceDesigning retrieval, ranking, and re-ranking stagesUsing a feature store to avoid training-serving skewHandling feature drift, cold start, and fallback paths