You are designing an image loading system for a mobile social app with image-heavy surfaces such as feed, stories, profiles, and marketplace listings. Instead of using fixed caching rules, your team wants an ML system that predicts which images should stay in memory, which should be persisted on disk, and which can be evicted, with the goal of improving scroll smoothness and reducing network usage. The system must adapt to user behavior, device constraints, and network quality while avoiding crashes, stale content, and wasted storage. This is a core client performance problem because image fetch latency directly affects engagement on the app's highest-traffic surfaces.
| Signal | Value |
|---|---|
| DAU | 180M |
| Peak image requests | 2.2M QPS |
| Avg images viewed per DAU per day | 320 |
| Distinct image URLs/day | 3.5B |
| On-device memory cache budget | 32-128 MB depending on device tier |
| On-device disk cache budget | 200 MB-2 GB depending on free storage |
| Per-image load latency target (p95) | < 80 ms if cached, < 300 ms if network fetch |
| Model decision latency budget on device | < 5 ms |
How would you design this end-to-end ML system so the mobile client can decide when to use memory caching versus disk caching for images at scale? Explain the architecture, model choices, online and offline components, evaluation approach, and how you would handle drift, skew, and operational failures.