
Meta's mobile infrastructure team needs to upgrade the networking layer used by the Facebook app on iOS and Android. The goal is to add standardized retry logic and error handling for Graph API and media requests so the app is more resilient on poor networks without causing duplicate writes, battery drain, or backend traffic spikes. You are the DRI coordinating the rollout across 8 engineers, 1 product manager, 1 data scientist, 1 SRE, and partner teams in Feed and Reels.
The Director of Mobile Infrastructure wants a production launch within 10 weeks because networking reliability is now an org-level priority. Feed and Reels engineering want the new layer quickly to reduce user-visible failures, but the SRE team is concerned that aggressive retries could increase server load during incidents. Privacy and Integrity teams require that request logging and client-side diagnostics avoid storing sensitive payload data.
You have 10 weeks, no additional headcount, and a remaining budget of $120,000 for QA devices, load testing, and on-call launch support. The networking layer currently handles 1.4 billion daily requests across the Facebook app, and 18% of mobile sessions occur on unreliable networks. The team can only modify the shared mobile client layer and cannot require backend schema changes this quarter. App size increase must stay under 400 KB, p95 request latency cannot worsen by more than 3%, and the rollout must start with 5% of DAU before expanding.