PayWave processes roughly 8 million card transactions per day. Fraud is rare but costly, and the risk team needs a model that identifies suspicious transactions for manual review without overwhelming analysts with false positives.
You are given a historical transaction dataset for binary classification.
| Feature Group | Count | Examples |
|---|---|---|
| Transaction attributes | 12 | amount, merchant_category, entry_mode, device_type |
| Customer behavior | 9 | avg_txn_amount_7d, txn_count_24h, chargeback_count_90d |
| Merchant signals | 6 | merchant_risk_score, country_match, prior_fraud_rate |
| Temporal/location | 5 | hour_of_day, day_of_week, geo_distance_from_home |
| Derived features | 8 | amount_vs_customer_avg, rapid_repeat_flag, cross_border_flag |
is_fraud — 1 if the transaction was confirmed fraudulent within 30 days, else 0A good solution should improve minority-class detection while keeping review volume manageable. Target at least 70% recall on fraud cases with precision >= 15% on the flagged set, and clearly justify threshold selection. Accuracy alone is not acceptable.