
A B2C media company wants to build a Databricks-based recommendation and retrieval platform that combines nightly embedding generation, vector search, and online reranking for 120 million users and 15 million content items. The system must support 12,000 QPS average and 30,000 QPS peak for recommendation requests, P95 latency under 180 ms, offline NDCG improvement of at least 8% over the current baseline, and a hard limit of 64 A100-equivalent GPUs plus $350k/month total serving spend. Ask the candidate to design the end-to-end architecture using Databricks components such as Mosaic AI Model Serving, Vector Search, Delta Lake, Lakeflow/streaming pipelines, and Unity Catalog, and to reason through tenant isolation, autoscaling, backfills, embedding refresh cadence, and failure modes. The candidate should explicitly estimate capacity, identify bottlenecks, and explain to a non-technical CFO why the chosen design meets both growth and cost targets better than a simpler always-on GPU fleet.