



CYou are reviewing a binary classifier that flags items for human review. The team says the model looks good overall, but reviewers are missing too many true positives and also spending time on false alarms. You need to judge how precision and recall should be evaluated for this system.
How would you evaluate precision and recall for an AI system?