Tell me about a time you discovered that a customer-facing RAG application was producing answers that were not sufficiently faithful or grounded in enterprise data. How did you use tools such as Databricks Vector Search, MLflow Agent Evaluation, LLM-as-Judge, or metrics like RAG Faithfulness and RAG Groundedness to diagnose the issue, redesign the pipeline, and restore customer trust? Please walk through the customer context, the tradeoffs you made, and the measurable outcome.