Choose Prompting, RAG, or Fine-Tuning

Context

BrightDesk is building an AI copilot for customer support agents. The feature should draft grounded answers to customer questions using help-center content, policy docs, and historical resolved tickets, while adapting to BrightDesk's tone and workflow.

Constraints

p95 latency: 1,500ms for agent-facing suggestions
Cost ceiling: $12K/month at 300K requests/month
Hallucination ceiling: <2% on policy-related questions
Must not reveal internal-only notes or customer PII
Prompt injection success rate from retrieved content should be near 0% on adversarial tests
Engineering team can support one production system in the next 8 weeks; avoid unnecessary complexity

Available Resources

40K public help-center articles and policy pages
2M historical support tickets with resolution labels; only 300K are high quality
Existing search stack supports BM25 and vector search
Access to GPT-4.1-mini / GPT-4.1 class models and embedding APIs
1,000 manually labeled evaluation examples across billing, refunds, outages, and account access
Human reviewers from support operations can label another 200 examples if needed

Task

Propose a decision framework for when BrightDesk should use prompt engineering only, RAG, or fine-tuning, and explain the trade-offs across quality, latency, cost, maintainability, and safety.
Design an initial production approach for the first launch, including prompt strategy, retrieval or training plan, and fallback/refusal behavior.
Define an evaluation-first plan: offline benchmarks before launch and online metrics after launch. Be explicit about how you will measure hallucination, groundedness, style adherence, and prompt-injection robustness.
Estimate cost and latency for your recommended approach, and explain what you would cut or change if the system misses either budget.
Identify the top failure modes, including cases where retrieval hurts quality or fine-tuning creates stale behavior, and propose mitigations.

Constraints

p95 latency: 1,500ms for agent-facing suggestions

Cost ceiling: $12K/month at 300K requests/month

Hallucination ceiling: <2% on policy-related questions

Must not reveal internal-only notes or customer PII

Prompt injection success rate from retrieved content should be near 0% on adversarial tests

Engineering team can support one production system in the next 8 weeks; avoid unnecessary complexity

Available Resources

40K public help-center articles and policy pages

2M historical support tickets with resolution labels; only 300K are high quality

Existing search stack supports BM25 and vector search

Access to GPT-4.1-mini / GPT-4.1 class models and embedding APIs

1,000 manually labeled evaluation examples across billing, refunds, outages, and account access

Human reviewers from support operations can label another 200 examples if needed

Task

Propose a decision framework for when BrightDesk should use prompt engineering only, RAG, or fine-tuning, and explain the trade-offs across quality, latency, cost, maintainability, and safety.

Design an initial production approach for the first launch, including prompt strategy, retrieval or training plan, and fallback/refusal behavior.

Define an evaluation-first plan: offline benchmarks before launch and online metrics after launch. Be explicit about how you will measure hallucination, groundedness, style adherence, and prompt-injection robustness.

Estimate cost and latency for your recommended approach, and explain what you would cut or change if the system misses either budget.

Identify the top failure modes, including cases where retrieval hurts quality or fine-tuning creates stale behavior, and propose mitigations.

Constraints

p95 latency: 1,500ms for agent-facing suggestions

Cost ceiling: $12K/month at 300K requests/month

Hallucination ceiling: <2% on policy-related questions

Must not reveal internal-only notes or customer PII

Prompt injection success rate from retrieved content should be near 0% on adversarial tests

Engineering team can support one production system in the next 8 weeks; avoid unnecessary complexity

Available Resources

40K public help-center articles and policy pages

2M historical support tickets with resolution labels; only 300K are high quality

Existing search stack supports BM25 and vector search

Access to GPT-4.1-mini / GPT-4.1 class models and embedding APIs

1,000 manually labeled evaluation examples across billing, refunds, outages, and account access

Human reviewers from support operations can label another 200 examples if needed

Task

Design an initial production approach for the first launch, including prompt strategy, retrieval or training plan, and fallback/refusal behavior.

Estimate cost and latency for your recommended approach, and explain what you would cut or change if the system misses either budget.

Identify the top failure modes, including cases where retrieval hurts quality or fine-tuning creates stale behavior, and propose mitigations.

Constraints

p95 latency: 1,500ms for agent-facing suggestions

Cost ceiling: $12K/month at 300K requests/month

Hallucination ceiling: <2% on policy-related questions

Must not reveal internal-only notes or customer PII

Prompt injection success rate from retrieved content should be near 0% on adversarial tests

Engineering team can support one production system in the next 8 weeks; avoid unnecessary complexity

Available Resources

40K public help-center articles and policy pages

2M historical support tickets with resolution labels; only 300K are high quality

Existing search stack supports BM25 and vector search

Access to GPT-4.1-mini / GPT-4.1 class models and embedding APIs

1,000 manually labeled evaluation examples across billing, refunds, outages, and account access

Human reviewers from support operations can label another 200 examples if needed

Task

Design an initial production approach for the first launch, including prompt strategy, retrieval or training plan, and fallback/refusal behavior.

Estimate cost and latency for your recommended approach, and explain what you would cut or change if the system misses either budget.

Identify the top failure modes, including cases where retrieval hurts quality or fine-tuning creates stale behavior, and propose mitigations.

Interview Guides

Context

Constraints

Available Resources

Task

Choose Prompting, RAG, or Fine-Tuning

Context

Constraints

Available Resources

Task

Your Answer

Choose Prompting, RAG, or Fine-Tuning

Context

Constraints

Available Resources

Task

Choose Prompting, RAG, or Fine-Tuning

Context

Constraints

Available Resources

Task

Your Answer