Context
FinSure, a B2B insurance platform, wants an internal agentic AI assistant for operations analysts. The assistant should answer policy and claims questions, summarize customer cases, and take limited actions such as creating follow-up tasks or drafting emails. It will operate over internal knowledge bases and a few approved tools, but must be safe for production use in a regulated environment.
Constraints
- p95 end-to-end latency: 3,500ms for read-only requests; 6,000ms for action-taking requests
- Cost ceiling: $35K/month at 200K requests/month
- Hallucination ceiling: <2% materially unsupported statements on a 400-task golden set
- Prompt-injection success rate: <1% on adversarial evals
- No raw PII or policy documents may be sent to non-approved external systems
- All tool actions must be auditable and require policy-aware authorization
Available Resources
- 1.2M internal documents: policy manuals, SOPs, claims playbooks, compliance memos, and ticket history
- Approved models: one high-quality frontier model and one cheaper fast model
- Internal tools: document search, customer profile lookup, task creation, email drafting, and case status APIs
- Existing IAM, document ACLs, audit logging, and DLP/PII redaction services
- Security team can label adversarial prompt-injection and data-exfiltration test cases
Deliverables
- Design a production architecture for an agentic AI framework that supports retrieval, tool use, authorization, and auditability while preserving data privacy.
- Define an eval-first plan: offline evaluation before launch and online monitoring after launch, including hallucination, prompt injection, privacy leakage, and task success.
- Write the system prompt and tool-use policy that constrain the agent's behavior, including refusal and escalation rules.
- Explain cost/latency tradeoffs, including when to use the cheaper model, when to avoid agent loops, and how to cap tool calls.
- Identify the top failure modes in production and how you would detect and mitigate them.