Business Context
PayWave processes roughly 8 million card transactions per day. Fraud is rare but expensive, so the risk team needs a binary classifier that catches fraudulent transactions without overwhelming manual reviewers with false positives.
Dataset
You are given a historical transaction dataset for offline model development.
| Feature Group | Count | Examples |
|---|
| Transaction attributes | 12 | amount, merchant_category, card_present, channel, currency |
| Customer behavior | 9 | avg_amount_7d, txn_count_24h, device_count_30d, chargeback_rate_user |
| Merchant risk | 6 | merchant_chargeback_rate, merchant_country, new_merchant_flag |
| Temporal / geo | 7 | hour_of_day, day_of_week, distance_from_home, ip_country_mismatch |
| Device / network | 8 | device_id_hash, browser_family, proxy_flag, email_domain_risk |
- Size: 1.2M transactions, 42 features
- Target:
is_fraud (1 = fraudulent transaction, 0 = legitimate)
- Class balance: 0.9% fraud, 99.1% non-fraud
- Missing data: ~18% missing in device/network fields, ~6% missing in geo features, negligible missingness elsewhere
Success Criteria
A good solution should achieve strong minority-class detection while keeping review volume manageable:
- PR AUC >= 0.45
- Recall >= 0.75 at precision >= 0.20
- Top-1% lift >= 12x versus random selection
Constraints
- Batch scoring is acceptable, but per-transaction inference should remain under 50 ms in production.
- The fraud operations team needs feature importance and threshold rationale.
- Avoid data leakage from future behavior aggregates.
- False positives have operational cost because each flagged transaction may trigger manual review or customer friction.
Deliverables
- Propose and implement a modeling approach for this imbalanced binary classification problem.
- Explain which imbalance-handling techniques you would use and why.
- Build a preprocessing and training pipeline that handles missing values and mixed feature types.
- Evaluate the model with metrics appropriate for rare-event detection, not just accuracy.
- Recommend a decision threshold based on business tradeoffs between fraud capture and false positives.