You are building a fraud detection model for a digital payments product. The goal is to predict whether a transaction is fraudulent before it is approved, but fraud cases are rare compared with legitimate transactions.
How would you approach training and evaluating a model on this highly imbalanced classification problem, and what tradeoffs would guide your choices around sampling, model selection, thresholding, and deployment?