An enterprise retailer wants a multimodal shopping assistant that supports text + image queries, personalized recommendations, and order-help flows. Normal traffic is 250 QPS, but Black Friday projections reach 2,500 QPS for 6 hours, with P95 latency under 900 ms for text-only turns and under 1.8 s for image-grounded turns; the business requires answer quality to remain within 3% of the baseline offline evaluation score while staying under a hard event budget of $95k for the week. Design the serving architecture and capacity plan using OpenAI APIs plus the necessary surrounding systems, including admission control, degradation strategies, queueing, caching, rate-limit handling, and regional failover. Then explain how you would communicate the plan and its risks to a VP of Engineering in non-jargony terms.