You are evaluating an LLM-based assistant that answers questions about building systems, installation guides, device manuals, and service procedures. The assistant uses retrieval over internal knowledge sources and returns answers with citations. Before wider rollout, you need a clear way to measure whether retrieval and final answers are actually good enough for real users.
How would you evaluate an LLM-based retrieval system for a building knowledge assistant?
RAG evaluation designRetrieval versus generation decompositionHallucination measurementPrompt injection awarenessVector search quality assessment