Explain Fine-Tuning vs RAG

Context

BrightDesk is building an internal AI assistant for sales engineers and product managers. One common use case is answering stakeholder questions like: "What's the difference between fine-tuning and RAG, and when should we use each?" in plain business language.

Constraints

p95 latency: 1,500ms
Cost ceiling: $3,000/month at 20,000 requests/month
Hallucination rate: <2% on a 150-question golden set
Answers must be understandable to non-technical stakeholders and avoid unnecessary jargon
The assistant must not invent company capabilities, customer examples, or ROI claims
Prompt injection and unsupported claims are considered real production risks

Available Resources

A curated internal knowledge base with 40 short documents: AI glossary, architecture patterns, pricing notes, case studies, and approved messaging
200 historical stakeholder questions with human-written answers
Access to a hosted LLM, embedding model, and vector search index
PM and solutions engineering reviewers who can label a small evaluation set

Task

Propose whether you would solve this primarily with prompt design, RAG, fine-tuning, or a combination, and justify the choice for this use case.
Design an evaluation plan first: define offline and online metrics for clarity, factuality, hallucination, and stakeholder usefulness.
Write a system prompt that explains fine-tuning vs RAG in plain English, includes when to use each, and refuses unsupported business claims.
Describe the serving architecture, including whether retrieval is needed, how you would ground answers, and how you would keep latency and cost within budget.
Identify likely failure modes such as jargon-heavy answers, hallucinated examples, and prompt injection, and explain mitigations.

Constraints

p95 latency: 1,500ms

Cost ceiling: $3,000/month at 20,000 requests/month

Hallucination rate: <2% on a 150-question golden set

Answers must be understandable to non-technical stakeholders and avoid unnecessary jargon

The assistant must not invent company capabilities, customer examples, or ROI claims

Prompt injection and unsupported claims are considered real production risks

Available Resources

A curated internal knowledge base with 40 short documents: AI glossary, architecture patterns, pricing notes, case studies, and approved messaging

200 historical stakeholder questions with human-written answers

Access to a hosted LLM, embedding model, and vector search index

PM and solutions engineering reviewers who can label a small evaluation set

Task

Propose whether you would solve this primarily with prompt design, RAG, fine-tuning, or a combination, and justify the choice for this use case.

Design an evaluation plan first: define offline and online metrics for clarity, factuality, hallucination, and stakeholder usefulness.

Write a system prompt that explains fine-tuning vs RAG in plain English, includes when to use each, and refuses unsupported business claims.

Describe the serving architecture, including whether retrieval is needed, how you would ground answers, and how you would keep latency and cost within budget.

Identify likely failure modes such as jargon-heavy answers, hallucinated examples, and prompt injection, and explain mitigations.

Constraints

p95 latency: 1,500ms

Cost ceiling: $3,000/month at 20,000 requests/month

Hallucination rate: <2% on a 150-question golden set

Answers must be understandable to non-technical stakeholders and avoid unnecessary jargon

The assistant must not invent company capabilities, customer examples, or ROI claims

Prompt injection and unsupported claims are considered real production risks

Available Resources

A curated internal knowledge base with 40 short documents: AI glossary, architecture patterns, pricing notes, case studies, and approved messaging

200 historical stakeholder questions with human-written answers

Access to a hosted LLM, embedding model, and vector search index

PM and solutions engineering reviewers who can label a small evaluation set

Task

Propose whether you would solve this primarily with prompt design, RAG, fine-tuning, or a combination, and justify the choice for this use case.

Design an evaluation plan first: define offline and online metrics for clarity, factuality, hallucination, and stakeholder usefulness.

Write a system prompt that explains fine-tuning vs RAG in plain English, includes when to use each, and refuses unsupported business claims.

Describe the serving architecture, including whether retrieval is needed, how you would ground answers, and how you would keep latency and cost within budget.

Identify likely failure modes such as jargon-heavy answers, hallucinated examples, and prompt injection, and explain mitigations.

Constraints

p95 latency: 1,500ms

Cost ceiling: $3,000/month at 20,000 requests/month

Hallucination rate: <2% on a 150-question golden set

Answers must be understandable to non-technical stakeholders and avoid unnecessary jargon

The assistant must not invent company capabilities, customer examples, or ROI claims

Prompt injection and unsupported claims are considered real production risks

Available Resources

A curated internal knowledge base with 40 short documents: AI glossary, architecture patterns, pricing notes, case studies, and approved messaging

200 historical stakeholder questions with human-written answers

Access to a hosted LLM, embedding model, and vector search index

PM and solutions engineering reviewers who can label a small evaluation set

Task

Propose whether you would solve this primarily with prompt design, RAG, fine-tuning, or a combination, and justify the choice for this use case.

Design an evaluation plan first: define offline and online metrics for clarity, factuality, hallucination, and stakeholder usefulness.

Write a system prompt that explains fine-tuning vs RAG in plain English, includes when to use each, and refuses unsupported business claims.

Describe the serving architecture, including whether retrieval is needed, how you would ground answers, and how you would keep latency and cost within budget.

Identify likely failure modes such as jargon-heavy answers, hallucinated examples, and prompt injection, and explain mitigations.

Interview Guides

Context

Constraints

Available Resources

Task

Explain Fine-Tuning vs RAG

Context

Constraints

Available Resources

Task

Your Answer

Explain Fine-Tuning vs RAG

Context

Constraints

Available Resources

Task

Explain Fine-Tuning vs RAG

Context

Constraints

Available Resources

Task

Your Answer