Business Context
You’re a data scientist at PayWave, a fintech processing 8–12 million card-not-present transactions per day across North America. Fraud losses are tightly monitored: every additional $10k/day in fraud is escalated, but overly aggressive blocking also harms customer trust and increases support costs.
A new real-time fraud model produces a binary signal Y = 1 (“alert”) when it believes a transaction is suspicious. The fraud operations team wants a simple, interpretable number: given an alert, what is the probability the transaction is actually fraudulent? This is crucial for deciding whether to auto-decline the transaction, step-up authenticate, or send to manual review.
Problem Statement
Compute the Bayesian probability P(X = 1 \mid Y = 1) where:
- X = 1 means the transaction is truly fraudulent.
- Y = 1 means the model raised an alert.
Then quantify what this implies operationally at PayWave’s scale.
Given Data
Assume the following metrics were estimated from a recent backtest on a representative week of traffic:
| Quantity | Meaning | Value |
|---|
| P(X=1) | Base fraud rate (prevalence) | 0.0032 |
| P(Y=1∣X=1) | True positive rate (sensitivity / recall) | 0.91 |
| P(Y=1∣X=0) | False positive rate | 0.018 |
| Daily volume | Transactions per day | 10,000,000 |
Requirements
- Write Bayes’ theorem for P(X=1∣Y=1) in terms of the given quantities.
- Compute P(Y=1), the overall probability a transaction triggers an alert.
- Compute P(X=1∣Y=1) numerically (this is the posterior fraud probability given an alert).
- Convert the result into expected daily counts: expected number of alerts per day and expected number of true frauds among those alerts.
- Briefly interpret: if PayWave auto-declines all alerts, what’s the precision implication and what trade-off does the base rate create?
Assumptions and Constraints
- Treat the backtest rates as stable and applicable to production (no distribution shift).
- Alerts are conditionally independent given fraud status (i.e., the model’s error rates summarize behavior).
- Ignore downstream effects (fraudsters adapting, customer churn due to false declines) for the calculation; discuss them qualitatively in interpretation.