You are building an ML powered ranking system for a consumer product feed. Some predictions can be precomputed ahead of time, while others depend on fresh user context and must be computed at request time. You need to decide how to split inference between batch and online paths.
What are the trade-offs between online and batch serving for ML models?