Context
Pulse, a consumer mobile banking app, wants to launch an in-app AI assistant that answers account questions, explains transactions, and helps users navigate support flows. The assistant must feel fast on mobile, avoid hallucinations, and handle sensitive financial data safely.
Constraints
- p95 end-to-end latency: 1,500ms on mobile networks
- Cost ceiling: $0.015 per request, with projected volume of 8M requests/month
- Hallucination rate: <1% on a high-risk golden set covering balances, fees, card controls, and dispute guidance
- Low-confidence behavior must be explicit: the assistant should ask a clarifying question, defer to a tool-backed answer, or escalate to human support instead of guessing
- Must not expose PII, secrets, or internal policies beyond user-authorized data
- Must resist prompt injection attempts from user input, OCR text, pasted emails, and retrieved support content
Available Resources
- Mobile app context: authenticated user ID, locale, device type, and current screen
- Read-only tools for balances, recent transactions, card status, branch hours, and support ticket status
- 2,000 approved help-center articles and policy documents
- One frontier chat model and one cheaper fallback model from an approved provider
- Existing redaction service for account numbers, SSNs, and payment card data
Task
- Design the assistant experience for normal answers, low-confidence cases, refusals, and escalation to human support.
- Write the system prompt and response schema that enforce grounded, safe, mobile-friendly behavior.
- Define an evaluation plan before architecture: offline safety/quality tests and online guardrail metrics.
- Propose the serving architecture, including when to use tools, when to use retrieval, and how to minimize cost and latency.
- Identify major failure modes around hallucination, prompt injection, and sensitive-data leakage, with mitigations.