Interview Guides

Design a Databricks-native RAG support agent with offline eval and online SLOs

Medium

ML System Design

You are building a customer-support copilot for Databricks docs, release notes, and support KBs using Mosaic AI. The system must handle 120 QPS steady-state and 300 QPS peak, with P95 end-to-end latency < 900 ms for answer generation, retrieval budget < 120 ms, and a serving budget of <$2.50 per 1,000 requests on average using Databricks Model Serving and/or Foundation Model APIs. Design the full RAG pipeline on Databricks: Spark + Delta Lake ETL for document ingestion/chunking, Unity Catalog governance, Databricks Vector Search indexing/refresh strategy, retrieval/reranking, prompt construction, and model selection (for example DBRX vs hosted FM APIs) under the latency/cost tradeoff. Then describe how you would evaluate and continuously improve the system using MLflow Agent Evaluation, including LLM-as-Judge, RAG Faithfulness, and RAG Groundedness metrics, and explain what signals would block rollout if offline quality is high but online hallucination complaints rise after a docs refresh.

Design a Databricks-native RAG support agent with offline eval and online SLOs

Medium

ML System Design

You are building a customer-support copilot for Databricks docs, release notes, and support KBs using Mosaic AI. The system must handle 120 QPS steady-state and 300 QPS peak, with P95 end-to-end latency < 900 ms for answer generation, retrieval budget < 120 ms, and a serving budget of <$2.50 per 1,000 requests on average using Databricks Model Serving and/or Foundation Model APIs. Design the full RAG pipeline on Databricks: Spark + Delta Lake ETL for document ingestion/chunking, Unity Catalog governance, Databricks Vector Search indexing/refresh strategy, retrieval/reranking, prompt construction, and model selection (for example DBRX vs hosted FM APIs) under the latency/cost tradeoff. Then describe how you would evaluate and continuously improve the system using MLflow Agent Evaluation, including LLM-as-Judge, RAG Faithfulness, and RAG Groundedness metrics, and explain what signals would block rollout if offline quality is high but online hallucination complaints rise after a docs refresh.

Your Answer

Design a Databricks-native RAG support agent with offline eval and online SLOs | Dataford Interview Questions - Dataford - Ace your Interview