Business Context
LendWise uses an ML model to flag potentially fraudulent loan applications before approval. Fraud is rare, so the risk team wants to understand how Bayes' theorem and conditional probability translate a model alert into an actual probability of fraud.
Problem Statement
A binary classifier labels applications as Flagged or Not Flagged. You need to compute the posterior probability that an application is truly fraudulent given that the model flagged it, and explain how this would be used in a practical ML decision pipeline.
Given Data
| Metric | Value |
|---|
| Daily applications | 100,000 |
| Base fraud rate | 0.8% |
| Model sensitivity: P(Flagged∣Fraud) | 92% |
| Model false positive rate: P(Flagged∣Not Fraud) | 4.5% |
| Manual review cost per flagged application | $3.20 |
| Expected loss if fraud is approved | $1,800 |
| Significance threshold for escalation decision | 20% posterior fraud probability |
Requirements
- State Bayes' theorem in this setting.
- Compute the probability an application is flagged: P(Flagged).
- Compute the posterior probability of fraud given a flag: P(Fraud∣Flagged).
- Calculate the expected number of flagged applications and true fraud cases among them per day.
- Decide whether a flagged application should automatically go to manual review if the escalation threshold is 20% posterior fraud probability.
- Briefly explain how conditional probability should influence threshold selection in an ML system with class imbalance.
Assumptions
- The sensitivity and false positive rate are stable and estimated from recent holdout data.
- Each application is independent.
- Fraud prevalence remains at 0.8% during deployment.
- Manual review perfectly blocks fraud once escalated.