Evaluate a Building Knowledge Assistant

Scenario

You are evaluating an LLM-based assistant that answers questions about building systems, installation guides, device manuals, and service procedures. The assistant uses retrieval over internal knowledge sources and returns answers with citations. Before wider rollout, you need a clear way to measure whether retrieval and final answers are actually good enough for real users.

Question

How would you evaluate an LLM-based retrieval system for a building knowledge assistant?

Problem

Scenario

Question

How would you evaluate an LLM-based retrieval system for a building knowledge assistant?

What this tests

RAG evaluation design
Retrieval versus generation decomposition
Hallucination measurement
Prompt injection awareness
Vector search quality assessment

Problem

Scenario

Question

How would you evaluate an LLM-based retrieval system for a building knowledge assistant?

What this tests

RAG evaluation design
Retrieval versus generation decomposition
Hallucination measurement
Prompt injection awareness
Vector search quality assessment

Problem

Scenario

Question

How would you evaluate an LLM-based retrieval system for a building knowledge assistant?

What this tests

RAG evaluation design
Retrieval versus generation decomposition
Hallucination measurement
Prompt injection awareness
Vector search quality assessment

Interview Guides

Problem

Scenario

Question

What this tests

Problem

Scenario

Question

What this tests

Evaluate a Building Knowledge Assistant

Problem

Scenario

Question

What this tests

Problem

Scenario

Question

What this tests