Evaluate Fraud Detection Error Tradeoff

Context

StripeShield uses a binary classification model to flag card transactions for fraud review before approval. The current model has reduced chargebacks overall, but the Risk team is concerned that the threshold may be too conservative and is allowing costly fraud through.

Current Performance

The model was evaluated on 1,000,000 recent transactions with an observed fraud rate of 0.8%.

Metric	Current Model
Precision	0.64
Recall	0.40
F1 Score	0.49
Accuracy	0.994
AUC-ROC	0.91
Fraud rate	0.008
Transactions flagged positive	5,000

Confusion matrix counts:

	Predicted Fraud	Predicted Legitimate
Actual Fraud	3,200	4,800
Actual Legitimate	1,800	990,200

The Problem

Each false negative results in an average fraud loss of $240. Each false positive sends a legitimate transaction to manual review or decline, costing $6 in operations and customer friction. The Head of Risk wants to know whether the model should be tuned for higher recall, even if precision drops.

Requirements

Interpret the trade-off between false positives and false negatives using the metrics above.
Quantify the business impact of current FP and FN errors.
Explain why accuracy is misleading in this setting.
Recommend whether to lower the decision threshold and what metric(s) should guide that choice.
Propose follow-up analyses to validate the recommendation.

Constraints

Manual review capacity is limited to 8,000 flagged transactions per day.
Customer experience is sensitive to unnecessary declines.
Fraud labels arrive with a 21-day delay, so evaluation is retrospective.

Context

Current Performance

The model was evaluated on 1,000,000 recent transactions with an observed fraud rate of 0.8%.

Metric	Current Model
Precision	0.64
Recall	0.40
F1 Score	0.49
Accuracy	0.994
AUC-ROC	0.91
Fraud rate	0.008
Transactions flagged positive	5,000

Confusion matrix counts:

	Predicted Fraud	Predicted Legitimate
Actual Fraud	3,200	4,800
Actual Legitimate	1,800	990,200

The Problem

Requirements

Interpret the trade-off between false positives and false negatives using the metrics above.
Quantify the business impact of current FP and FN errors.
Explain why accuracy is misleading in this setting.
Recommend whether to lower the decision threshold and what metric(s) should guide that choice.
Propose follow-up analyses to validate the recommendation.

Constraints

Manual review capacity is limited to 8,000 flagged transactions per day.
Customer experience is sensitive to unnecessary declines.
Fraud labels arrive with a 21-day delay, so evaluation is retrospective.

Context

Current Performance

The model was evaluated on 1,000,000 recent transactions with an observed fraud rate of 0.8%.

Metric	Current Model
Precision	0.64
Recall	0.40
F1 Score	0.49
Accuracy	0.994
AUC-ROC	0.91
Fraud rate	0.008
Transactions flagged positive	5,000

Confusion matrix counts:

	Predicted Fraud	Predicted Legitimate
Actual Fraud	3,200	4,800
Actual Legitimate	1,800	990,200

The Problem

Requirements

Interpret the trade-off between false positives and false negatives using the metrics above.
Quantify the business impact of current FP and FN errors.
Explain why accuracy is misleading in this setting.
Recommend whether to lower the decision threshold and what metric(s) should guide that choice.
Propose follow-up analyses to validate the recommendation.

Constraints

Manual review capacity is limited to 8,000 flagged transactions per day.
Customer experience is sensitive to unnecessary declines.
Fraud labels arrive with a 21-day delay, so evaluation is retrospective.

Context

Current Performance

The model was evaluated on 1,000,000 recent transactions with an observed fraud rate of 0.8%.

Metric	Current Model
Precision	0.64
Recall	0.40
F1 Score	0.49
Accuracy	0.994
AUC-ROC	0.91
Fraud rate	0.008
Transactions flagged positive	5,000

Confusion matrix counts:

	Predicted Fraud	Predicted Legitimate
Actual Fraud	3,200	4,800
Actual Legitimate	1,800	990,200

The Problem

Requirements

Interpret the trade-off between false positives and false negatives using the metrics above.
Quantify the business impact of current FP and FN errors.
Explain why accuracy is misleading in this setting.
Recommend whether to lower the decision threshold and what metric(s) should guide that choice.
Propose follow-up analyses to validate the recommendation.

Constraints

Manual review capacity is limited to 8,000 flagged transactions per day.
Customer experience is sensitive to unnecessary declines.
Fraud labels arrive with a 21-day delay, so evaluation is retrospective.

Interview Guides

Context

Current Performance

The Problem

Requirements

Constraints

Evaluate Fraud Detection Error Tradeoff

Context

Current Performance

The Problem

Requirements

Constraints

Your Answer

Evaluate Fraud Detection Error Tradeoff

Context

Current Performance

The Problem

Requirements

Constraints

Evaluate Fraud Detection Error Tradeoff

Context

Current Performance

The Problem

Requirements

Constraints

Your Answer