Automate Insurance Claim Triage

Context

Northstar Mutual wants to add an LLM-powered assistant into its existing claims workflow. The feature reads incoming claim packets (adjuster notes, policy documents, photos/OCR text, prior claim history, and customer emails) and drafts a triage summary with recommended next actions for a human adjuster.

Constraints

p95 end-to-end latency: 3,000ms per claim packet
Cost ceiling: $0.08 per claim processed, or $40K/month at 500K claims
Accuracy bar:
- critical-fact extraction recall e 97%
- unsupported recommendation rate < 2%
- wrong policy-citation rate < 1%
Safety requirements:
- no final claim approval/denial by the model
- must cite policy clauses and source documents for every recommendation
- must resist prompt injection from customer-submitted text or OCR artifacts
- must not expose PII beyond authorized workflow systems

Available Data / Models

2M historical claim packets with final adjuster outcomes
150K policy documents and endorsements in PDF/text form
Existing claims management system with APIs for claim metadata, fraud flags, and customer history
Approved LLM provider (OpenAI or Anthropic), internal vector store, and OCR pipeline
40 senior adjusters available to label a golden set of 1,000 claims

Deliverables

Design the end-to-end LLM system that integrates into the current claims workflow, including retrieval, prompting, and human-review checkpoints.
Define an evaluation plan before architecture: offline golden-set evaluation, adversarial testing, and online launch metrics.
Specify how the system grounds recommendations in policy text and claim evidence while minimizing hallucinations.
Estimate cost and latency at target volume, including where you would use smaller vs larger models.
Identify major failure modes, especially prompt injection, unsupported recommendations, stale policy retrieval, and PII leakage.

Context

Constraints

p95 end-to-end latency: 3,000ms per claim packet
Cost ceiling: $0.08 per claim processed, or $40K/month at 500K claims
Accuracy bar:
- critical-fact extraction recall e 97%
- unsupported recommendation rate < 2%
- wrong policy-citation rate < 1%
Safety requirements:
- no final claim approval/denial by the model
- must cite policy clauses and source documents for every recommendation
- must resist prompt injection from customer-submitted text or OCR artifacts
- must not expose PII beyond authorized workflow systems

Available Data / Models

2M historical claim packets with final adjuster outcomes
150K policy documents and endorsements in PDF/text form
Existing claims management system with APIs for claim metadata, fraud flags, and customer history
Approved LLM provider (OpenAI or Anthropic), internal vector store, and OCR pipeline
40 senior adjusters available to label a golden set of 1,000 claims

Deliverables

Design the end-to-end LLM system that integrates into the current claims workflow, including retrieval, prompting, and human-review checkpoints.
Define an evaluation plan before architecture: offline golden-set evaluation, adversarial testing, and online launch metrics.
Specify how the system grounds recommendations in policy text and claim evidence while minimizing hallucinations.
Estimate cost and latency at target volume, including where you would use smaller vs larger models.
Identify major failure modes, especially prompt injection, unsupported recommendations, stale policy retrieval, and PII leakage.

Context

Constraints

p95 end-to-end latency: 3,000ms per claim packet
Cost ceiling: $0.08 per claim processed, or $40K/month at 500K claims
Accuracy bar:
- critical-fact extraction recall e 97%
- unsupported recommendation rate < 2%
- wrong policy-citation rate < 1%
Safety requirements:
- no final claim approval/denial by the model
- must cite policy clauses and source documents for every recommendation
- must resist prompt injection from customer-submitted text or OCR artifacts
- must not expose PII beyond authorized workflow systems

Available Data / Models

2M historical claim packets with final adjuster outcomes
150K policy documents and endorsements in PDF/text form
Existing claims management system with APIs for claim metadata, fraud flags, and customer history
Approved LLM provider (OpenAI or Anthropic), internal vector store, and OCR pipeline
40 senior adjusters available to label a golden set of 1,000 claims

Deliverables

Design the end-to-end LLM system that integrates into the current claims workflow, including retrieval, prompting, and human-review checkpoints.
Define an evaluation plan before architecture: offline golden-set evaluation, adversarial testing, and online launch metrics.
Specify how the system grounds recommendations in policy text and claim evidence while minimizing hallucinations.
Estimate cost and latency at target volume, including where you would use smaller vs larger models.
Identify major failure modes, especially prompt injection, unsupported recommendations, stale policy retrieval, and PII leakage.

Context

Constraints

p95 end-to-end latency: 3,000ms per claim packet
Cost ceiling: $0.08 per claim processed, or $40K/month at 500K claims
Accuracy bar:
- critical-fact extraction recall e 97%
- unsupported recommendation rate < 2%
- wrong policy-citation rate < 1%
Safety requirements:
- no final claim approval/denial by the model
- must cite policy clauses and source documents for every recommendation
- must resist prompt injection from customer-submitted text or OCR artifacts
- must not expose PII beyond authorized workflow systems

Available Data / Models

2M historical claim packets with final adjuster outcomes
150K policy documents and endorsements in PDF/text form
Existing claims management system with APIs for claim metadata, fraud flags, and customer history
Approved LLM provider (OpenAI or Anthropic), internal vector store, and OCR pipeline
40 senior adjusters available to label a golden set of 1,000 claims

Deliverables

Design the end-to-end LLM system that integrates into the current claims workflow, including retrieval, prompting, and human-review checkpoints.
Define an evaluation plan before architecture: offline golden-set evaluation, adversarial testing, and online launch metrics.
Specify how the system grounds recommendations in policy text and claim evidence while minimizing hallucinations.
Estimate cost and latency at target volume, including where you would use smaller vs larger models.
Identify major failure modes, especially prompt injection, unsupported recommendations, stale policy retrieval, and PII leakage.

Interview Guides

Context

Constraints

Available Data / Models

Deliverables

Automate Insurance Claim Triage

Context

Constraints

Available Data / Models

Deliverables

Your Answer

Automate Insurance Claim Triage

Context

Constraints

Available Data / Models

Deliverables

Automate Insurance Claim Triage

Context

Constraints

Available Data / Models

Deliverables

Your Answer