Choose AUC-ROC or F1

Scenario

You own a binary classifier that prioritizes suspicious account sign-ins for manual review in Microsoft Defender. The current logistic regression model and a new LightGBM challenger are both evaluated offline before deployment, and accounts scoring above a 0.40 threshold are sent to analysts. Security leadership notices the challenger has a slightly higher AUC-ROC, but the operations team prefers the current model because it produces better precision and F1 at the chosen threshold. You need to explain what each metric is actually measuring and which one should guide model selection for this use case.

Performance Data

Metric	Current Model	Challenger Model
AUC-ROC	0.91	0.94
Precision @ 0.40	0.74	0.61
Recall @ 0.40	0.68	0.79
F1 Score @ 0.40	0.71	0.69
False Positive Rate @ 0.40	0.032	0.071
Daily alerts sent to analysts	4,300	7,100
Analyst review capacity/day	5,000	5,000
Positive class prevalence	2.8%	2.8%

Question

How would you explain the difference between AUC-ROC and F1-score using these results, and when would you prefer one over the other for selecting or tuning this model?

Scenario

Metric

Current Model

Challenger Model

AUC-ROC

0.91

0.94

Precision @ 0.40

0.74

0.61

Recall @ 0.40

0.68

0.79

F1 Score @ 0.40

0.71

0.69

False Positive Rate @ 0.40

0.032

0.071

Daily alerts sent to analysts

4,300

7,100

Analyst review capacity/day

5,000

Positive class prevalence

2.8%

Scenario

Metric

Current Model

Challenger Model

AUC-ROC

0.91

0.94

Precision @ 0.40

0.74

0.61

Recall @ 0.40

0.68

0.79

F1 Score @ 0.40

0.71

0.69

False Positive Rate @ 0.40

0.032

0.071

Daily alerts sent to analysts

4,300

7,100

Analyst review capacity/day

5,000

Positive class prevalence

2.8%

Scenario

Metric

Current Model

Challenger Model

AUC-ROC

0.91

0.94

Precision @ 0.40

0.74

0.61

Recall @ 0.40

0.68

0.79

F1 Score @ 0.40

0.71

0.69

False Positive Rate @ 0.40

0.032

0.071

Daily alerts sent to analysts

4,300

7,100

Analyst review capacity/day

5,000

Positive class prevalence

2.8%

Interview Guides

Scenario

Performance Data

Question

Choose AUC-ROC or F1

Scenario

Performance Data

Question

Your Answer

Choose AUC-ROC or F1

Scenario

Performance Data

Question

Choose AUC-ROC or F1

Scenario

Performance Data

Question

Your Answer