Diagnose Missed Fraud Cases

Context

ShieldPay runs a card-not-present fraud detection model that scores transactions in real time and sends high-risk cases to a manual review queue. A large enterprise customer says the platform is "missing the fraud we expected it to catch," even though the model still looks strong on aggregate ranking metrics.

Current Performance

Metric	Last Quarter	Current	Change
Precision	0.76	0.88	+0.12
Recall	0.81	0.58	-0.23
F1 Score	0.78	0.70	-0.08
AUC-ROC	0.93	0.92	-0.01
PR-AUC	0.41	0.36	-0.05
Fraud review rate	1.9%	1.1%	-0.8 pts
Monthly fraud loss at customer	$420K	$690K	+64%

The model threshold was raised from 0.42 to 0.63 six weeks ago to reduce analyst workload. The customer’s fraud base rate also increased from 0.35% to 0.52% after expansion into cross-border transactions.

The Problem

You need to determine whether the issue is thresholding, calibration drift, segment-specific underperformance, or a broader model quality problem. The customer wants a clear explanation for why fewer fraud cases are being caught despite strong AUC.

Requirements

Interpret the metrics and explain the most likely reason the customer perceives underperformance.
Identify at least 3 plausible root causes and how you would validate each.
Analyze what the confusion matrix implies for business impact.
Recommend specific model, threshold, and monitoring changes.
Explain what additional slices or offline evaluations you would request before retraining.

Constraints

Manual review capacity is capped at 6,000 transactions/day.
False positives create customer friction and merchant support costs.
Fraud labels arrive with a 21-day delay.
The business cannot accept a large increase in review volume without a strong expected ROI.

Context

Current Performance

Metric	Last Quarter	Current	Change
Precision	0.76	0.88	+0.12
Recall	0.81	0.58	-0.23
F1 Score	0.78	0.70	-0.08
AUC-ROC	0.93	0.92	-0.01
PR-AUC	0.41	0.36	-0.05
Fraud review rate	1.9%	1.1%	-0.8 pts
Monthly fraud loss at customer	$420K	$690K	+64%

The Problem

Requirements

Interpret the metrics and explain the most likely reason the customer perceives underperformance.
Identify at least 3 plausible root causes and how you would validate each.
Analyze what the confusion matrix implies for business impact.
Recommend specific model, threshold, and monitoring changes.
Explain what additional slices or offline evaluations you would request before retraining.

Constraints

Manual review capacity is capped at 6,000 transactions/day.
False positives create customer friction and merchant support costs.
Fraud labels arrive with a 21-day delay.
The business cannot accept a large increase in review volume without a strong expected ROI.

Context

Current Performance

Metric	Last Quarter	Current	Change
Precision	0.76	0.88	+0.12
Recall	0.81	0.58	-0.23
F1 Score	0.78	0.70	-0.08
AUC-ROC	0.93	0.92	-0.01
PR-AUC	0.41	0.36	-0.05
Fraud review rate	1.9%	1.1%	-0.8 pts
Monthly fraud loss at customer	$420K	$690K	+64%

The Problem

Requirements

Interpret the metrics and explain the most likely reason the customer perceives underperformance.
Identify at least 3 plausible root causes and how you would validate each.
Analyze what the confusion matrix implies for business impact.
Recommend specific model, threshold, and monitoring changes.
Explain what additional slices or offline evaluations you would request before retraining.

Constraints

Manual review capacity is capped at 6,000 transactions/day.
False positives create customer friction and merchant support costs.
Fraud labels arrive with a 21-day delay.
The business cannot accept a large increase in review volume without a strong expected ROI.

Context

Current Performance

Metric	Last Quarter	Current	Change
Precision	0.76	0.88	+0.12
Recall	0.81	0.58	-0.23
F1 Score	0.78	0.70	-0.08
AUC-ROC	0.93	0.92	-0.01
PR-AUC	0.41	0.36	-0.05
Fraud review rate	1.9%	1.1%	-0.8 pts
Monthly fraud loss at customer	$420K	$690K	+64%

The Problem

Requirements

Interpret the metrics and explain the most likely reason the customer perceives underperformance.
Identify at least 3 plausible root causes and how you would validate each.
Analyze what the confusion matrix implies for business impact.
Recommend specific model, threshold, and monitoring changes.
Explain what additional slices or offline evaluations you would request before retraining.

Constraints

Manual review capacity is capped at 6,000 transactions/day.
False positives create customer friction and merchant support costs.
Fraud labels arrive with a 21-day delay.
The business cannot accept a large increase in review volume without a strong expected ROI.

Interview Guides

Context

Current Performance

The Problem

Requirements

Constraints

Diagnose Missed Fraud Cases

Context

Current Performance

The Problem

Requirements

Constraints

Your Answer

Diagnose Missed Fraud Cases

Context

Current Performance

The Problem

Requirements

Constraints

Diagnose Missed Fraud Cases

Context

Current Performance

The Problem

Requirements

Constraints

Your Answer