PayLink processes roughly 8 million card transactions per day. Fraud is rare but expensive, so the risk team needs a model that catches fraudulent transactions without overwhelming manual reviewers with false positives.
You are given a historical transaction dataset for binary classification: predict whether a transaction is fraudulent (is_fraud=1) or legitimate (is_fraud=0). The data is sampled from 6 months of production traffic.
| Feature Group | Count | Examples |
|---|---|---|
| Transaction attributes | 10 | amount, merchant_category, payment_method, device_type |
| Customer behavior | 8 | transactions_24h, avg_amount_30d, chargebacks_90d, account_age_days |
| Velocity and risk signals | 7 | ip_risk_score, distance_from_home_km, failed_logins_7d, new_device_flag |
| Temporal/context | 5 | hour_of_day, day_of_week, is_weekend, country, currency |
distance_from_home_km, 7% in ip_risk_score, and sparse missingness in merchant metadataA good solution should achieve recall >= 0.85 on fraud while keeping precision >= 0.25 at the operating threshold. The model should also improve ranking quality enough to support manual review queues.