Threshold, Workflow, or Accept

Context

ShopFlow uses a binary classifier to predict whether an order should be sent to manual fraud review before fulfillment. The customer is unhappy with current outcomes: analysts say too many reviewed orders are legitimate, while operations says some fraudulent orders are still shipping.

Current Performance

Metric	Current Model @ 0.60 Threshold	Alternative @ 0.45 Threshold	Target / Constraint
Precision	0.78	0.61	Review team wants >0.70
Recall	0.42	0.68	Risk team wants >0.60
F1 Score	0.55	0.64	Higher is better
AUC-ROC	0.86	0.86	Model ranking unchanged
False Positive Rate	0.9%	2.1%	Minimize customer friction
Orders flagged/day	1,150	2,050	Team capacity: 1,600/day
Fraud prevalence	1.8%	1.8%	Stable
Score calibration error	0.11	0.11	Lower is better

The Problem

The customer asks whether they should tune the decision threshold, redesign the review workflow, or accept the current model behavior because the model itself may already be near its practical limit. You need to interpret the metrics and recommend the best path.

Requirements

Determine whether threshold tuning alone can satisfy both fraud and operations goals.
Explain what the unchanged AUC but changing precision/recall implies.
Recommend when a workflow change is preferable to model tuning.
Identify whether calibration issues affect threshold decisions.
State when accepting current behavior is reasonable.

Constraints

Manual review capacity cannot exceed 1,600 orders/day this quarter.
False negatives cost about $120 per missed fraudulent order.
False positives cost $6 in analyst time plus customer-delay risk.
Retraining the model would take 4 weeks; threshold or workflow changes can ship in days.

Context

Current Performance

Metric	Current Model @ 0.60 Threshold	Alternative @ 0.45 Threshold	Target / Constraint
Precision	0.78	0.61	Review team wants >0.70
Recall	0.42	0.68	Risk team wants >0.60
F1 Score	0.55	0.64	Higher is better
AUC-ROC	0.86	0.86	Model ranking unchanged
False Positive Rate	0.9%	2.1%	Minimize customer friction
Orders flagged/day	1,150	2,050	Team capacity: 1,600/day
Fraud prevalence	1.8%	1.8%	Stable
Score calibration error	0.11	0.11	Lower is better

The Problem

Requirements

Determine whether threshold tuning alone can satisfy both fraud and operations goals.
Explain what the unchanged AUC but changing precision/recall implies.
Recommend when a workflow change is preferable to model tuning.
Identify whether calibration issues affect threshold decisions.
State when accepting current behavior is reasonable.

Constraints

Manual review capacity cannot exceed 1,600 orders/day this quarter.
False negatives cost about $120 per missed fraudulent order.
False positives cost $6 in analyst time plus customer-delay risk.
Retraining the model would take 4 weeks; threshold or workflow changes can ship in days.

Context

Current Performance

Metric	Current Model @ 0.60 Threshold	Alternative @ 0.45 Threshold	Target / Constraint
Precision	0.78	0.61	Review team wants >0.70
Recall	0.42	0.68	Risk team wants >0.60
F1 Score	0.55	0.64	Higher is better
AUC-ROC	0.86	0.86	Model ranking unchanged
False Positive Rate	0.9%	2.1%	Minimize customer friction
Orders flagged/day	1,150	2,050	Team capacity: 1,600/day
Fraud prevalence	1.8%	1.8%	Stable
Score calibration error	0.11	0.11	Lower is better

The Problem

Requirements

Determine whether threshold tuning alone can satisfy both fraud and operations goals.
Explain what the unchanged AUC but changing precision/recall implies.
Recommend when a workflow change is preferable to model tuning.
Identify whether calibration issues affect threshold decisions.
State when accepting current behavior is reasonable.

Constraints

Manual review capacity cannot exceed 1,600 orders/day this quarter.
False negatives cost about $120 per missed fraudulent order.
False positives cost $6 in analyst time plus customer-delay risk.
Retraining the model would take 4 weeks; threshold or workflow changes can ship in days.

Context

Current Performance

Metric	Current Model @ 0.60 Threshold	Alternative @ 0.45 Threshold	Target / Constraint
Precision	0.78	0.61	Review team wants >0.70
Recall	0.42	0.68	Risk team wants >0.60
F1 Score	0.55	0.64	Higher is better
AUC-ROC	0.86	0.86	Model ranking unchanged
False Positive Rate	0.9%	2.1%	Minimize customer friction
Orders flagged/day	1,150	2,050	Team capacity: 1,600/day
Fraud prevalence	1.8%	1.8%	Stable
Score calibration error	0.11	0.11	Lower is better

The Problem

Requirements

Determine whether threshold tuning alone can satisfy both fraud and operations goals.
Explain what the unchanged AUC but changing precision/recall implies.
Recommend when a workflow change is preferable to model tuning.
Identify whether calibration issues affect threshold decisions.
State when accepting current behavior is reasonable.

Constraints

Manual review capacity cannot exceed 1,600 orders/day this quarter.
False negatives cost about $120 per missed fraudulent order.
False positives cost $6 in analyst time plus customer-delay risk.
Retraining the model would take 4 weeks; threshold or workflow changes can ship in days.

Interview Guides

Context

Current Performance

The Problem

Requirements

Constraints

Threshold, Workflow, or Accept

Context

Current Performance

The Problem

Requirements

Constraints

Your Answer

Threshold, Workflow, or Accept

Context

Current Performance

The Problem

Requirements

Constraints

Threshold, Workflow, or Accept

Context

Current Performance

The Problem

Requirements

Constraints

Your Answer