Context
FinPilot, a B2B finance software company, wants to add an AI assistant that answers customer questions about product workflows, explains billing issues, and drafts support replies. Leadership is unsure whether to ship the first version with prompt engineering only, retrieval over internal knowledge sources, or fine-tuning for domain behavior.
Constraints
- p95 latency: 2,500ms for interactive support use cases
- Cost ceiling: $0.03 per request and $35K/month at 1.2M requests
- Hallucination ceiling: <2% on policy and billing questions
- Prompt injection success rate: <1% on adversarial tests
- Regulated environment: responses must not leak PII or quote inaccessible internal notes
- Engineering team: 3 ML engineers, 6 weeks to MVP
Available Resources
- 120K help-center articles, internal runbooks, release notes, and policy docs
- 1.8M historical support tickets with agent-written replies and resolution labels
- Existing search stack supports BM25 and vector search
- Approved models: a small low-cost chat model, a stronger general-purpose model, and the option to fine-tune one supported base model
- Human reviewers can label 1,000 examples for a golden set
Task
You are not asked to build the full system. Instead, define a decision framework for when this capability should use prompting, retrieval, or fine-tuning.
- Propose an eval-first plan that compares prompt-only, RAG, and fine-tuned approaches before architecture is finalized.
- Segment the product requirements into capability types (e.g., factual QA, style consistency, workflow guidance, classification, drafting) and map each to prompting, RAG, fine-tuning, or a hybrid.
- Design the target architecture for your recommended MVP, including where retrieval is required and where fine-tuning is justified or not justified.
- Specify safety controls for hallucination, prompt injection, and access control failures.
- Estimate cost, latency, and operational complexity for each option, then recommend one path for MVP and one likely follow-up iteration.