Measure Fairness of AI Settlements

Business Context

You’re the analytics lead at ResolveAI, a fintech/legaltech platform that helps insurers and claimants settle auto injury and property damage claims. The product serves 3.5M claims/year across the US and UK, and ~40% of claims now receive an AI-generated initial settlement offer that a human adjuster can accept, edit, or override. ResolveAI’s customers care about cycle time and cost, but they are also under increasing scrutiny from regulators and plaintiffs’ attorneys for disparate outcomes across protected classes.

Over the last quarter, ResolveAI expanded the model to recommend offers for more complex injury claims. Customer success reports that cycle time improved (median time-to-offer down from 3.2 days to 1.1 days), but a large insurer paused rollout after an internal audit suggested the AI’s offers were “systematically lower” for certain demographic groups after controlling for claim severity. The CEO asks you to design a single, reportable fairness metric (with supporting cuts) that can be used in:

quarterly business reviews with insurers,
internal model release gates,
and (if needed) regulatory responses.

Metric Scenario

Stakeholders disagree on what “fair” means:

Legal/Compliance wants parity across protected classes (race/ethnicity proxies, gender, age band, disability status where available).
Claims Ops wants consistency with historical adjuster outcomes and reduced variance.
Finance worries that “fairness” metrics could be gamed by overpaying.
ML argues that fairness must be evaluated conditional on claim severity and jurisdiction.

You have to propose a metric that is: (a) hard to game, (b) decomposable into root causes, (c) stable enough to track weekly, and (d) aligned with business outcomes (retention, expansion, regulatory risk).

Available Data

Source	What it contains	Grain
`claims`	claim_id, insurer_id, jurisdiction, claim_type, injury_severity_score, policy_limits, claimant_age_band, claimant_gender (optional), protected_class_proxy (optional), attorney_representation, prior_claims_count	claim
`ai_offers`	claim_id, model_version, offer_amount, offer_timestamp, confidence_score, explanation_features (top-K), guardrail_flags	claim-offer
`adjuster_actions`	claim_id, adjuster_id, accepted_ai_offer (Y/N), edited_amount, override_reason, time_to_offer	claim
`settlements`	claim_id, final_settlement_amount, settlement_date, litigation_filed (Y/N), time_to_close	claim
`customer_complaints`	claim_id, complaint_type (bias/unfair/other), channel, resolution	complaint
`external_benchmarks`	jurisdiction-level typical payout ranges by claim_type/severity (industry data)	jurisdiction x segment

What You Need To Produce

Define a primary KPI for “AI settlement fairness” that can be tracked over time and compared across model versions.
Explain why your definition is appropriate for this domain (settlement negotiation, policy limits, jurisdictional variation) and what it does not capture.
Provide a calculation approach including: severity/jurisdiction adjustment, handling missing protected-class data, and minimum sample thresholds.
Decompose the metric into actionable drivers (where disparity is coming from: data, model, workflow, or downstream negotiation).
Propose benchmarks/targets for launch gates (e.g., “block release if fairness degrades by X”), and specify guardrail metrics to prevent “fairness via overpayment.”
Recommend concrete actions you would take if the metric worsens after a new model release.

Constraints:

You must support weekly reporting, but also provide a quarterly view suitable for audits.
Protected-class attributes are incomplete and may be inferred via proxies in some jurisdictions.
The metric must be robust to the fact that the final settlement is influenced by negotiation, attorney involvement, and litigation.

Business Context

quarterly business reviews with insurers,
internal model release gates,
and (if needed) regulatory responses.

Metric Scenario

Stakeholders disagree on what “fair” means:

Legal/Compliance wants parity across protected classes (race/ethnicity proxies, gender, age band, disability status where available).
Claims Ops wants consistency with historical adjuster outcomes and reduced variance.
Finance worries that “fairness” metrics could be gamed by overpaying.
ML argues that fairness must be evaluated conditional on claim severity and jurisdiction.

Available Data

Source	What it contains	Grain
`claims`	claim_id, insurer_id, jurisdiction, claim_type, injury_severity_score, policy_limits, claimant_age_band, claimant_gender (optional), protected_class_proxy (optional), attorney_representation, prior_claims_count	claim
`ai_offers`	claim_id, model_version, offer_amount, offer_timestamp, confidence_score, explanation_features (top-K), guardrail_flags	claim-offer
`adjuster_actions`	claim_id, adjuster_id, accepted_ai_offer (Y/N), edited_amount, override_reason, time_to_offer	claim
`settlements`	claim_id, final_settlement_amount, settlement_date, litigation_filed (Y/N), time_to_close	claim
`customer_complaints`	claim_id, complaint_type (bias/unfair/other), channel, resolution	complaint
`external_benchmarks`	jurisdiction-level typical payout ranges by claim_type/severity (industry data)	jurisdiction x segment

What You Need To Produce

Define a primary KPI for “AI settlement fairness” that can be tracked over time and compared across model versions.
Explain why your definition is appropriate for this domain (settlement negotiation, policy limits, jurisdictional variation) and what it does not capture.
Provide a calculation approach including: severity/jurisdiction adjustment, handling missing protected-class data, and minimum sample thresholds.
Decompose the metric into actionable drivers (where disparity is coming from: data, model, workflow, or downstream negotiation).
Propose benchmarks/targets for launch gates (e.g., “block release if fairness degrades by X”), and specify guardrail metrics to prevent “fairness via overpayment.”
Recommend concrete actions you would take if the metric worsens after a new model release.

Constraints:

You must support weekly reporting, but also provide a quarterly view suitable for audits.
Protected-class attributes are incomplete and may be inferred via proxies in some jurisdictions.
The metric must be robust to the fact that the final settlement is influenced by negotiation, attorney involvement, and litigation.

Business Context

quarterly business reviews with insurers,
internal model release gates,
and (if needed) regulatory responses.

Metric Scenario

Stakeholders disagree on what “fair” means:

Legal/Compliance wants parity across protected classes (race/ethnicity proxies, gender, age band, disability status where available).
Claims Ops wants consistency with historical adjuster outcomes and reduced variance.
Finance worries that “fairness” metrics could be gamed by overpaying.
ML argues that fairness must be evaluated conditional on claim severity and jurisdiction.

Available Data

Source	What it contains	Grain
`claims`	claim_id, insurer_id, jurisdiction, claim_type, injury_severity_score, policy_limits, claimant_age_band, claimant_gender (optional), protected_class_proxy (optional), attorney_representation, prior_claims_count	claim
`ai_offers`	claim_id, model_version, offer_amount, offer_timestamp, confidence_score, explanation_features (top-K), guardrail_flags	claim-offer
`adjuster_actions`	claim_id, adjuster_id, accepted_ai_offer (Y/N), edited_amount, override_reason, time_to_offer	claim
`settlements`	claim_id, final_settlement_amount, settlement_date, litigation_filed (Y/N), time_to_close	claim
`customer_complaints`	claim_id, complaint_type (bias/unfair/other), channel, resolution	complaint
`external_benchmarks`	jurisdiction-level typical payout ranges by claim_type/severity (industry data)	jurisdiction x segment

What You Need To Produce

Define a primary KPI for “AI settlement fairness” that can be tracked over time and compared across model versions.
Explain why your definition is appropriate for this domain (settlement negotiation, policy limits, jurisdictional variation) and what it does not capture.
Provide a calculation approach including: severity/jurisdiction adjustment, handling missing protected-class data, and minimum sample thresholds.
Decompose the metric into actionable drivers (where disparity is coming from: data, model, workflow, or downstream negotiation).
Propose benchmarks/targets for launch gates (e.g., “block release if fairness degrades by X”), and specify guardrail metrics to prevent “fairness via overpayment.”
Recommend concrete actions you would take if the metric worsens after a new model release.

Constraints:

You must support weekly reporting, but also provide a quarterly view suitable for audits.
Protected-class attributes are incomplete and may be inferred via proxies in some jurisdictions.
The metric must be robust to the fact that the final settlement is influenced by negotiation, attorney involvement, and litigation.

Business Context

quarterly business reviews with insurers,
internal model release gates,
and (if needed) regulatory responses.

Metric Scenario

Stakeholders disagree on what “fair” means:

Legal/Compliance wants parity across protected classes (race/ethnicity proxies, gender, age band, disability status where available).
Claims Ops wants consistency with historical adjuster outcomes and reduced variance.
Finance worries that “fairness” metrics could be gamed by overpaying.
ML argues that fairness must be evaluated conditional on claim severity and jurisdiction.

Available Data

Source	What it contains	Grain
`claims`	claim_id, insurer_id, jurisdiction, claim_type, injury_severity_score, policy_limits, claimant_age_band, claimant_gender (optional), protected_class_proxy (optional), attorney_representation, prior_claims_count	claim
`ai_offers`	claim_id, model_version, offer_amount, offer_timestamp, confidence_score, explanation_features (top-K), guardrail_flags	claim-offer
`adjuster_actions`	claim_id, adjuster_id, accepted_ai_offer (Y/N), edited_amount, override_reason, time_to_offer	claim
`settlements`	claim_id, final_settlement_amount, settlement_date, litigation_filed (Y/N), time_to_close	claim
`customer_complaints`	claim_id, complaint_type (bias/unfair/other), channel, resolution	complaint
`external_benchmarks`	jurisdiction-level typical payout ranges by claim_type/severity (industry data)	jurisdiction x segment

What You Need To Produce

Define a primary KPI for “AI settlement fairness” that can be tracked over time and compared across model versions.
Explain why your definition is appropriate for this domain (settlement negotiation, policy limits, jurisdictional variation) and what it does not capture.
Provide a calculation approach including: severity/jurisdiction adjustment, handling missing protected-class data, and minimum sample thresholds.
Decompose the metric into actionable drivers (where disparity is coming from: data, model, workflow, or downstream negotiation).
Propose benchmarks/targets for launch gates (e.g., “block release if fairness degrades by X”), and specify guardrail metrics to prevent “fairness via overpayment.”
Recommend concrete actions you would take if the metric worsens after a new model release.

Constraints:

You must support weekly reporting, but also provide a quarterly view suitable for audits.
Protected-class attributes are incomplete and may be inferred via proxies in some jurisdictions.
The metric must be robust to the fact that the final settlement is influenced by negotiation, attorney involvement, and litigation.

Interview Guides

Business Context

Metric Scenario

Available Data

What You Need To Produce

Measure Fairness of AI Settlements

Business Context

Metric Scenario

Available Data

What You Need To Produce

Your Answer

Measure Fairness of AI Settlements

Business Context

Metric Scenario

Available Data

What You Need To Produce

Measure Fairness of AI Settlements

Business Context

Metric Scenario

Available Data

What You Need To Produce

Your Answer