Reduce Hallucinations in Support Assistant

Context

FinFlow, a mid-market fintech, has a customer-facing AI assistant that answers account, billing, and product-policy questions inside its web app. The current assistant is helpful but occasionally fabricates policy details or unsupported troubleshooting steps, creating compliance and trust risk.

Constraints

p95 latency must stay under 2,500ms end-to-end
Cost ceiling: $0.035 per request and $45K/month at 1.2M requests
Hallucination rate must be below 1.5% on a labeled customer-support golden set
For questions not supported by approved sources, the assistant must refuse or escalate rather than guess
Must resist prompt injection from user input and retrieved documents
Responses must not expose PII or internal-only policy text

Available Resources

80K approved support articles, help-center pages, policy documents, and troubleshooting runbooks
18 months of historical support chats with resolution labels and escalation outcomes
Product catalog metadata, account-state APIs, and a ticket-escalation tool
Access to a production-approved LLM, an embedding model, and a hybrid search index
2,000 human-labeled evaluation examples, including unanswerable and adversarial prompts

Task

Design a workflow to reduce hallucinations in this assistant while preserving user experience.

Propose an evaluation-first plan: define offline and online metrics, golden-set slices, and launch gates before describing the architecture.
Design the end-to-end workflow, including prompt strategy, retrieval, grounding, refusal behavior, and when to call tools or escalate to a human.
Explain how you would defend against hallucinated claims, unsupported citations, prompt injection, stale documents, and account-specific mistakes.
Estimate cost and latency for your design, and describe what you would change if you were over either budget.
Outline how you would monitor regressions after launch and safely iterate on prompts, retrieval, or models.

Context

Constraints

p95 latency must stay under 2,500ms end-to-end
Cost ceiling: $0.035 per request and $45K/month at 1.2M requests
Hallucination rate must be below 1.5% on a labeled customer-support golden set
For questions not supported by approved sources, the assistant must refuse or escalate rather than guess
Must resist prompt injection from user input and retrieved documents
Responses must not expose PII or internal-only policy text

Available Resources

80K approved support articles, help-center pages, policy documents, and troubleshooting runbooks
18 months of historical support chats with resolution labels and escalation outcomes
Product catalog metadata, account-state APIs, and a ticket-escalation tool
Access to a production-approved LLM, an embedding model, and a hybrid search index
2,000 human-labeled evaluation examples, including unanswerable and adversarial prompts

Task

Design a workflow to reduce hallucinations in this assistant while preserving user experience.

Propose an evaluation-first plan: define offline and online metrics, golden-set slices, and launch gates before describing the architecture.
Design the end-to-end workflow, including prompt strategy, retrieval, grounding, refusal behavior, and when to call tools or escalate to a human.
Explain how you would defend against hallucinated claims, unsupported citations, prompt injection, stale documents, and account-specific mistakes.
Estimate cost and latency for your design, and describe what you would change if you were over either budget.
Outline how you would monitor regressions after launch and safely iterate on prompts, retrieval, or models.

Context

Constraints

p95 latency must stay under 2,500ms end-to-end
Cost ceiling: $0.035 per request and $45K/month at 1.2M requests
Hallucination rate must be below 1.5% on a labeled customer-support golden set
For questions not supported by approved sources, the assistant must refuse or escalate rather than guess
Must resist prompt injection from user input and retrieved documents
Responses must not expose PII or internal-only policy text

Available Resources

80K approved support articles, help-center pages, policy documents, and troubleshooting runbooks
18 months of historical support chats with resolution labels and escalation outcomes
Product catalog metadata, account-state APIs, and a ticket-escalation tool
Access to a production-approved LLM, an embedding model, and a hybrid search index
2,000 human-labeled evaluation examples, including unanswerable and adversarial prompts

Task

Design a workflow to reduce hallucinations in this assistant while preserving user experience.

Propose an evaluation-first plan: define offline and online metrics, golden-set slices, and launch gates before describing the architecture.
Design the end-to-end workflow, including prompt strategy, retrieval, grounding, refusal behavior, and when to call tools or escalate to a human.
Explain how you would defend against hallucinated claims, unsupported citations, prompt injection, stale documents, and account-specific mistakes.
Estimate cost and latency for your design, and describe what you would change if you were over either budget.
Outline how you would monitor regressions after launch and safely iterate on prompts, retrieval, or models.

Context

Constraints

p95 latency must stay under 2,500ms end-to-end
Cost ceiling: $0.035 per request and $45K/month at 1.2M requests
Hallucination rate must be below 1.5% on a labeled customer-support golden set
For questions not supported by approved sources, the assistant must refuse or escalate rather than guess
Must resist prompt injection from user input and retrieved documents
Responses must not expose PII or internal-only policy text

Available Resources

80K approved support articles, help-center pages, policy documents, and troubleshooting runbooks
18 months of historical support chats with resolution labels and escalation outcomes
Product catalog metadata, account-state APIs, and a ticket-escalation tool
Access to a production-approved LLM, an embedding model, and a hybrid search index
2,000 human-labeled evaluation examples, including unanswerable and adversarial prompts

Task

Design a workflow to reduce hallucinations in this assistant while preserving user experience.

Propose an evaluation-first plan: define offline and online metrics, golden-set slices, and launch gates before describing the architecture.
Design the end-to-end workflow, including prompt strategy, retrieval, grounding, refusal behavior, and when to call tools or escalate to a human.
Explain how you would defend against hallucinated claims, unsupported citations, prompt injection, stale documents, and account-specific mistakes.
Estimate cost and latency for your design, and describe what you would change if you were over either budget.
Outline how you would monitor regressions after launch and safely iterate on prompts, retrieval, or models.

Interview Guides

Context

Constraints

Available Resources

Task

Reduce Hallucinations in Support Assistant

Context

Constraints

Available Resources

Task

Your Answer

Reduce Hallucinations in Support Assistant

Context

Constraints

Available Resources

Task

Reduce Hallucinations in Support Assistant

Context

Constraints

Available Resources

Task

Your Answer