Context
Northstar Mutual wants to add an LLM-powered assistant into its existing claims workflow. The feature reads incoming claim packets (adjuster notes, policy documents, photos/OCR text, prior claim history, and customer emails) and drafts a triage summary with recommended next actions for a human adjuster.
Constraints
- p95 end-to-end latency: 3,000ms per claim packet
- Cost ceiling: $0.08 per claim processed, or $40K/month at 500K claims
- Accuracy bar:
- critical-fact extraction recall e 97%
- unsupported recommendation rate < 2%
- wrong policy-citation rate < 1%
- Safety requirements:
- no final claim approval/denial by the model
- must cite policy clauses and source documents for every recommendation
- must resist prompt injection from customer-submitted text or OCR artifacts
- must not expose PII beyond authorized workflow systems
Available Data / Models
- 2M historical claim packets with final adjuster outcomes
- 150K policy documents and endorsements in PDF/text form
- Existing claims management system with APIs for claim metadata, fraud flags, and customer history
- Approved LLM provider (OpenAI or Anthropic), internal vector store, and OCR pipeline
- 40 senior adjusters available to label a golden set of 1,000 claims
Deliverables
- Design the end-to-end LLM system that integrates into the current claims workflow, including retrieval, prompting, and human-review checkpoints.
- Define an evaluation plan before architecture: offline golden-set evaluation, adversarial testing, and online launch metrics.
- Specify how the system grounds recommendations in policy text and claim evidence while minimizing hallucinations.
- Estimate cost and latency at target volume, including where you would use smaller vs larger models.
- Identify major failure modes, especially prompt injection, unsupported recommendations, stale policy retrieval, and PII leakage.