Choose Prompting, RAG, or Fine-Tuning

Context

FinPilot, a B2B finance software company, wants to add an AI assistant that answers customer questions about product workflows, explains billing issues, and drafts support replies. Leadership is unsure whether to ship the first version with prompt engineering only, retrieval over internal knowledge sources, or fine-tuning for domain behavior.

Constraints

p95 latency: 2,500ms for interactive support use cases
Cost ceiling: $0.03 per request and $35K/month at 1.2M requests
Hallucination ceiling: <2% on policy and billing questions
Prompt injection success rate: <1% on adversarial tests
Regulated environment: responses must not leak PII or quote inaccessible internal notes
Engineering team: 3 ML engineers, 6 weeks to MVP

Available Resources

120K help-center articles, internal runbooks, release notes, and policy docs
1.8M historical support tickets with agent-written replies and resolution labels
Existing search stack supports BM25 and vector search
Approved models: a small low-cost chat model, a stronger general-purpose model, and the option to fine-tune one supported base model
Human reviewers can label 1,000 examples for a golden set

Task

You are not asked to build the full system. Instead, define a decision framework for when this capability should use prompting, retrieval, or fine-tuning.

Propose an eval-first plan that compares prompt-only, RAG, and fine-tuned approaches before architecture is finalized.
Segment the product requirements into capability types (e.g., factual QA, style consistency, workflow guidance, classification, drafting) and map each to prompting, RAG, fine-tuning, or a hybrid.
Design the target architecture for your recommended MVP, including where retrieval is required and where fine-tuning is justified or not justified.
Specify safety controls for hallucination, prompt injection, and access control failures.
Estimate cost, latency, and operational complexity for each option, then recommend one path for MVP and one likely follow-up iteration.

Context

Constraints

p95 latency: 2,500ms for interactive support use cases
Cost ceiling: $0.03 per request and $35K/month at 1.2M requests
Hallucination ceiling: <2% on policy and billing questions
Prompt injection success rate: <1% on adversarial tests
Regulated environment: responses must not leak PII or quote inaccessible internal notes
Engineering team: 3 ML engineers, 6 weeks to MVP

Available Resources

120K help-center articles, internal runbooks, release notes, and policy docs
1.8M historical support tickets with agent-written replies and resolution labels
Existing search stack supports BM25 and vector search
Approved models: a small low-cost chat model, a stronger general-purpose model, and the option to fine-tune one supported base model
Human reviewers can label 1,000 examples for a golden set

Task

You are not asked to build the full system. Instead, define a decision framework for when this capability should use prompting, retrieval, or fine-tuning.

Propose an eval-first plan that compares prompt-only, RAG, and fine-tuned approaches before architecture is finalized.
Segment the product requirements into capability types (e.g., factual QA, style consistency, workflow guidance, classification, drafting) and map each to prompting, RAG, fine-tuning, or a hybrid.
Design the target architecture for your recommended MVP, including where retrieval is required and where fine-tuning is justified or not justified.
Specify safety controls for hallucination, prompt injection, and access control failures.
Estimate cost, latency, and operational complexity for each option, then recommend one path for MVP and one likely follow-up iteration.

Context

Constraints

p95 latency: 2,500ms for interactive support use cases
Cost ceiling: $0.03 per request and $35K/month at 1.2M requests
Hallucination ceiling: <2% on policy and billing questions
Prompt injection success rate: <1% on adversarial tests
Regulated environment: responses must not leak PII or quote inaccessible internal notes
Engineering team: 3 ML engineers, 6 weeks to MVP

Available Resources

120K help-center articles, internal runbooks, release notes, and policy docs
1.8M historical support tickets with agent-written replies and resolution labels
Existing search stack supports BM25 and vector search
Approved models: a small low-cost chat model, a stronger general-purpose model, and the option to fine-tune one supported base model
Human reviewers can label 1,000 examples for a golden set

Task

You are not asked to build the full system. Instead, define a decision framework for when this capability should use prompting, retrieval, or fine-tuning.

Propose an eval-first plan that compares prompt-only, RAG, and fine-tuned approaches before architecture is finalized.
Segment the product requirements into capability types (e.g., factual QA, style consistency, workflow guidance, classification, drafting) and map each to prompting, RAG, fine-tuning, or a hybrid.
Design the target architecture for your recommended MVP, including where retrieval is required and where fine-tuning is justified or not justified.
Specify safety controls for hallucination, prompt injection, and access control failures.
Estimate cost, latency, and operational complexity for each option, then recommend one path for MVP and one likely follow-up iteration.

Context

Constraints

p95 latency: 2,500ms for interactive support use cases
Cost ceiling: $0.03 per request and $35K/month at 1.2M requests
Hallucination ceiling: <2% on policy and billing questions
Prompt injection success rate: <1% on adversarial tests
Regulated environment: responses must not leak PII or quote inaccessible internal notes
Engineering team: 3 ML engineers, 6 weeks to MVP

Available Resources

120K help-center articles, internal runbooks, release notes, and policy docs
1.8M historical support tickets with agent-written replies and resolution labels
Existing search stack supports BM25 and vector search
Approved models: a small low-cost chat model, a stronger general-purpose model, and the option to fine-tune one supported base model
Human reviewers can label 1,000 examples for a golden set

Task

You are not asked to build the full system. Instead, define a decision framework for when this capability should use prompting, retrieval, or fine-tuning.

Propose an eval-first plan that compares prompt-only, RAG, and fine-tuned approaches before architecture is finalized.
Segment the product requirements into capability types (e.g., factual QA, style consistency, workflow guidance, classification, drafting) and map each to prompting, RAG, fine-tuning, or a hybrid.
Design the target architecture for your recommended MVP, including where retrieval is required and where fine-tuning is justified or not justified.
Specify safety controls for hallucination, prompt injection, and access control failures.
Estimate cost, latency, and operational complexity for each option, then recommend one path for MVP and one likely follow-up iteration.

Interview Guides

Context

Constraints

Available Resources

Task

Choose Prompting, RAG, or Fine-Tuning

Context

Constraints

Available Resources

Task

Your Answer

Choose Prompting, RAG, or Fine-Tuning

Context

Constraints

Available Resources

Task

Choose Prompting, RAG, or Fine-Tuning

Context

Constraints

Available Resources

Task

Your Answer