Grounded Support Assistant Prompt Design

Scenario

You are building a support assistant for an internal operations team that answers questions about policies, workflows, and product procedures using only internal documentation. The assistant will serve a few hundred support agents and handle roughly 8,000 questions per day across a corpus of policy docs, SOPs, and knowledge-base articles. Existing keyword search is slow and often returns outdated pages, so the team wants a grounded assistant that can answer directly and cite sources. Trust matters more than coverage: unsupported answers are worse than refusals.

Constraints

p95 latency must stay under 2,000ms end-to-end
Cost ceiling is $6,000/month at projected volume
Hallucination rate must be below 2% on a labeled evaluation set
The assistant must answer only from retrieved internal documents and cite sources
It must resist prompt injection in both user messages and retrieved documents

Available Resources

~40,000 internal documents with metadata such as owner, last-updated date, and access scope
An approved LLM API, embedding model, and managed vector database
Search logs and a backlog of historical support tickets
Capacity for ~300 human-labeled evaluation questions and monthly refreshes

Question

How would you design the prompt and surrounding retrieval flow so the assistant answers only from internal documents, refuses when evidence is missing, and remains reliable under latency, cost, and safety constraints? Explain how you would evaluate the system before launch and monitor it in production.

Scenario

Constraints

p95 latency must stay under 2,000ms end-to-end

Cost ceiling is $6,000/month at projected volume

Hallucination rate must be below 2% on a labeled evaluation set

The assistant must answer only from retrieved internal documents and cite sources

It must resist prompt injection in both user messages and retrieved documents

Question

Scenario

Constraints

p95 latency must stay under 2,000ms end-to-end

Cost ceiling is $6,000/month at projected volume

Hallucination rate must be below 2% on a labeled evaluation set

The assistant must answer only from retrieved internal documents and cite sources

It must resist prompt injection in both user messages and retrieved documents

Question

Scenario

Constraints

p95 latency must stay under 2,000ms end-to-end

Cost ceiling is $6,000/month at projected volume

Hallucination rate must be below 2% on a labeled evaluation set

The assistant must answer only from retrieved internal documents and cite sources

It must resist prompt injection in both user messages and retrieved documents

Question

Interview Guides

Scenario

Constraints

Available Resources

Question

Grounded Support Assistant Prompt Design

Scenario

Constraints

Available Resources

Question

Your Answer

Grounded Support Assistant Prompt Design

Scenario

Constraints

Available Resources

Question

Grounded Support Assistant Prompt Design

Scenario

Constraints

Available Resources

Question

Your Answer