Product Context
ShieldSure is a national insurer handling auto, home, and health-related claims through its mobile app and call-center workflows. Design an ML system that helps route, prioritize, and assist adjusters on incoming claims by estimating severity, fraud risk, document completeness, and next-best action.
Scale
| Signal | Value |
|---|
| Active policyholders | 25M |
| Claims submitted/day | 1.2M |
| Peak claim-event QPS | 4,500 |
| Historical claims archive | 180M claims |
| Documents/images per day | 9M |
| Human adjusters | 18,000 |
| p99 decision latency budget | 250ms for synchronous triage |
Task
- Clarify the product goals and define what decisions must be made in real time vs asynchronously.
- Design an end-to-end ML architecture for claim intake, candidate retrieval of similar historical claims, ranking/prioritization, and downstream re-ranking or policy-rule enforcement.
- Choose models for each stage and explain feature design, labels, and how you would handle delayed outcomes such as final payout or confirmed fraud.
- Define batch and online serving paths, feature store requirements, and capacity planning at peak traffic.
- Propose offline and online evaluation, including business metrics, fairness/compliance checks, and rollout strategy.
- Identify key failure modes such as feature drift, training-serving skew, missing documents, and policy-rule changes.
Constraints
- The system cannot auto-deny claims; high-risk predictions must route to human review.
- PII and medical data are regulated; training and serving must satisfy auditability and access controls.
- Some labels are delayed by weeks or months (fraud confirmation, litigation outcome, final settlement amount).
- New claim types and policy changes appear frequently, so the system must tolerate schema evolution and cold-start scenarios.
- Cost matters: only lightweight models may run synchronously on every claim; heavier document/image models should be precomputed or used selectively.