Context
ShopWave, a mid-market e-commerce platform, wants an LLM-powered chatbot for real-time customer engagement across web and mobile. The bot should answer order-status, returns, billing, and product-policy questions, and escalate to a human agent when confidence is low or the request is sensitive.
Constraints
- p95 end-to-end latency: 2,500ms for standard Q&A, 4,000ms for tool-backed order lookups
- Cost ceiling: $35K/month at 1.2M conversations/month
- Hallucination ceiling: <2% on policy and account-related answers
- Must resist prompt injection from user messages and retrieved content
- Must not expose PII or account data without authentication and authorization
- Responses should be grounded in approved help-center and policy content, with citations for factual claims
Available Resources
- 120K help-center articles, return/shipping policies, product FAQs, and agent macros
- Structured tools:
get_order_status(order_id, user_id), create_return(order_id, item_id), handoff_to_agent(reason)
- Conversation logs from the current rules-based chatbot, including CSAT and escalation outcomes
- Approved models: a fast small model for classification/routing and a stronger model for grounded answer generation
- Existing hybrid search stack (BM25 + vector search) and a reranker service
Task
- Design the end-to-end chatbot architecture, including intent classification, retrieval, tool use, escalation logic, and safety controls.
- Write the system prompt for the answer-generation stage so the bot stays grounded, asks clarifying questions when needed, and refuses unsupported claims.
- Define an evaluation plan before implementation: offline golden sets, adversarial prompt-injection tests, hallucination measurement, and online success metrics.
- Estimate latency and cost at target volume, and explain how you would stay within both budgets.
- Identify the top failure modes in production and propose mitigations, monitoring, and rollback criteria.