Choose Prompting, RAG, or Fine-Tuning

Context

Northstar Health wants an internal assistant for support agents answering questions about benefits, claims policy, and member communications. Leadership is debating whether to ship with careful prompting, a RAG system over internal documents, or fine-tuning for domain-specific behavior.

Constraints

p95 latency must be under 2,500ms for live agent assist
Budget ceiling: $35K/month at 1.2M requests/month
Hallucination rate must stay below 2% on policy questions
Prompt injection success rate must be below 0.5% on adversarial tests
Answers that affect coverage or reimbursement must cite approved sources or refuse
PHI must not be exposed to unauthorized users or external logs

Available Resources

120K internal documents: policy manuals, claims SOPs, plan summaries, and approved email templates
18 months of historical support chats with agent-written resolutions and QA scores
Approved models: GPT-4.1-mini, GPT-4.1, and a fine-tunable smaller model for internal deployment
Existing hybrid search stack: BM25 + vector search with metadata filters by plan, state, and effective date
Security team can provide document-level ACLs and redaction services
2 product analysts and 15 QA specialists can label an evaluation set

Task

Propose an evaluation-first framework to compare prompting-only, RAG, and fine-tuning for this use case. Define offline and online metrics, golden sets, and pass/fail thresholds before choosing an architecture.
Recommend which approach you would launch first, and where the other two approaches still fit. Be explicit about trade-offs across latency, cost, maintainability, freshness, and safety.
Design the target system for your recommended approach, including prompting strategy, retrieval or training data design, fallback/refusal behavior, and access control.
Explain how you would test and mitigate hallucination, prompt injection, stale policy answers, and PHI leakage.
Estimate request-level and monthly cost/latency, and describe what changes you would make if the system misses either the budget or latency target.

Constraints

p95 latency must be under 2,500ms for live agent assist

Budget ceiling: $35K/month at 1.2M requests/month

Hallucination rate must stay below 2% on policy questions

Prompt injection success rate must be below 0.5% on adversarial tests

Answers that affect coverage or reimbursement must cite approved sources or refuse

PHI must not be exposed to unauthorized users or external logs

Available Resources

120K internal documents: policy manuals, claims SOPs, plan summaries, and approved email templates

18 months of historical support chats with agent-written resolutions and QA scores

Approved models: GPT-4.1-mini, GPT-4.1, and a fine-tunable smaller model for internal deployment

Existing hybrid search stack: BM25 + vector search with metadata filters by plan, state, and effective date

Security team can provide document-level ACLs and redaction services

2 product analysts and 15 QA specialists can label an evaluation set

Task

Propose an evaluation-first framework to compare prompting-only, RAG, and fine-tuning for this use case. Define offline and online metrics, golden sets, and pass/fail thresholds before choosing an architecture.

Recommend which approach you would launch first, and where the other two approaches still fit. Be explicit about trade-offs across latency, cost, maintainability, freshness, and safety.

Design the target system for your recommended approach, including prompting strategy, retrieval or training data design, fallback/refusal behavior, and access control.

Explain how you would test and mitigate hallucination, prompt injection, stale policy answers, and PHI leakage.

Estimate request-level and monthly cost/latency, and describe what changes you would make if the system misses either the budget or latency target.

Constraints

p95 latency must be under 2,500ms for live agent assist

Budget ceiling: $35K/month at 1.2M requests/month

Hallucination rate must stay below 2% on policy questions

Prompt injection success rate must be below 0.5% on adversarial tests

Answers that affect coverage or reimbursement must cite approved sources or refuse

PHI must not be exposed to unauthorized users or external logs

Available Resources

120K internal documents: policy manuals, claims SOPs, plan summaries, and approved email templates

18 months of historical support chats with agent-written resolutions and QA scores

Approved models: GPT-4.1-mini, GPT-4.1, and a fine-tunable smaller model for internal deployment

Existing hybrid search stack: BM25 + vector search with metadata filters by plan, state, and effective date

Security team can provide document-level ACLs and redaction services

2 product analysts and 15 QA specialists can label an evaluation set

Task

Recommend which approach you would launch first, and where the other two approaches still fit. Be explicit about trade-offs across latency, cost, maintainability, freshness, and safety.

Design the target system for your recommended approach, including prompting strategy, retrieval or training data design, fallback/refusal behavior, and access control.

Explain how you would test and mitigate hallucination, prompt injection, stale policy answers, and PHI leakage.

Estimate request-level and monthly cost/latency, and describe what changes you would make if the system misses either the budget or latency target.

Constraints

p95 latency must be under 2,500ms for live agent assist

Budget ceiling: $35K/month at 1.2M requests/month

Hallucination rate must stay below 2% on policy questions

Prompt injection success rate must be below 0.5% on adversarial tests

Answers that affect coverage or reimbursement must cite approved sources or refuse

PHI must not be exposed to unauthorized users or external logs

Available Resources

120K internal documents: policy manuals, claims SOPs, plan summaries, and approved email templates

18 months of historical support chats with agent-written resolutions and QA scores

Approved models: GPT-4.1-mini, GPT-4.1, and a fine-tunable smaller model for internal deployment

Existing hybrid search stack: BM25 + vector search with metadata filters by plan, state, and effective date

Security team can provide document-level ACLs and redaction services

2 product analysts and 15 QA specialists can label an evaluation set

Task

Recommend which approach you would launch first, and where the other two approaches still fit. Be explicit about trade-offs across latency, cost, maintainability, freshness, and safety.

Design the target system for your recommended approach, including prompting strategy, retrieval or training data design, fallback/refusal behavior, and access control.

Explain how you would test and mitigate hallucination, prompt injection, stale policy answers, and PHI leakage.

Estimate request-level and monthly cost/latency, and describe what changes you would make if the system misses either the budget or latency target.

Interview Guides

Context

Constraints

Available Resources

Task

Choose Prompting, RAG, or Fine-Tuning

Context

Constraints

Available Resources

Task

Your Answer

Choose Prompting, RAG, or Fine-Tuning

Context

Constraints

Available Resources

Task

Choose Prompting, RAG, or Fine-Tuning

Context

Constraints

Available Resources

Task

Your Answer