Context
ShopSphere is building an AI shopping assistant for its ecommerce app. The assistant already answers product questions from a catalog and help-center corpus using RAG, but product leadership now wants it to also handle tasks like checking live inventory, comparing shipping options, applying promo codes, and initiating returns.
Constraints
- p95 latency: 2,500ms for informational queries; up to 4,000ms for transactional flows
- Cost ceiling: $35K/month at 1.2M requests/month
- Hallucination ceiling: <2% on policy, price, and availability claims
- Prompt-injection success rate: <0.5% on adversarial evals
- Any action that changes customer state (cart, order, return, address) must be auditable and require explicit user confirmation
- The system must degrade safely to a non-agentic answer when tools are unavailable
Available Resources
- RAG corpus: product catalog, reviews, FAQs, return policy, shipping policy, seller documentation
- Tools/APIs: live inventory, pricing, promotions, cart service, order history, returns eligibility, shipping ETA, CRM ticket creation
- Models available: a low-cost model for routing/classification and a stronger model for answer generation / tool use
- Historical chat logs for 50K sessions, plus a labeled set of 800 queries across browse, compare, buy, and post-purchase intents
Task
- Design a decision framework for when the assistant should use RAG only versus a tool-enabled agent, with examples of queries that belong in each path.
- Propose an end-to-end architecture, including intent routing, retrieval, tool calling, confirmation steps for actions, and fallback behavior when tools fail.
- Define an evaluation plan first: offline and online metrics for answer quality, task success, hallucination, prompt injection resilience, latency, and cost.
- Write a system prompt for the agent path that enforces groundedness, safe tool use, explicit confirmation before mutations, and refusal behavior.
- Estimate cost and latency for the two paths and explain the main tradeoffs in choosing agents over a RAG-only experience.