Design a multi-agent enterprise analytics assistant on Databricks with governance and cost controls

A Fortune 100 customer wants an enterprise analytics assistant that can answer policy questions, generate SQL over governed data, and route to specialist agents for documentation lookup, KPI explanation, and incident triage. Design a multi-agent system using the Databricks Agent Framework where a planner/router agent coordinates 3-5 specialist agents, all governed by Unity Catalog and backed by Delta Lake tables plus Databricks Vector Search over unstructured corpora; the service must support 40 QPS average and 100 QPS peak, with P95 < 1.8 s for simple Q&A, P95 < 3.5 s for SQL+RAG workflows, and a hard monthly inference budget of $180k on no more than 24 GPU-equivalent serving replicas. Explain why you would build this on Databricks rather than a fragmented stack, including how Spark on Databricks is used for Vector Search ETL/index freshness, how agent traces and evaluations are logged in MLflow, and how you would use MLflow Agent Evaluation with LLM-as-Judge plus faithfulness/groundedness metrics to compare single-agent vs multi-agent designs. Be explicit about failure modes such as agent routing loops, stale embeddings after Delta updates, SQL safety, and what architectural changes you would make if groundedness improves but latency and cost violate SLOs.

Interview Guides

Design a multi-agent enterprise analytics assistant on Databricks with governance and cost controls