Business Context
PayFlow processes roughly 8 million card transactions per day across online and in-store merchants. The risk team wants a binary classification model that flags potentially fraudulent transactions quickly enough to support near-real-time review while minimizing unnecessary declines for legitimate customers.
Dataset
You are given a labeled transaction dataset collected over the last 6 months.
| Feature Group | Count | Examples |
|---|
| Transaction attributes | 12 | amount, currency, merchant_category, channel, card_present |
| Customer behavior | 9 | avg_txn_amount_7d, txn_count_24h, unique_merchants_30d |
| Device & network | 8 | device_id_hash, ip_risk_score, geo_distance_from_home |
| Merchant metadata | 6 | merchant_risk_score, chargeback_rate_90d, country |
| Temporal features | 5 | hour_of_day, day_of_week, seconds_since_last_txn |
| | |
- Size: 2.4M transactions, 40 engineered/input features
- Target: Fraud label from confirmed chargebacks and manual investigations
- Class balance: Highly imbalanced; fraud is rare
- Missing data: Some device and merchant fields are missing for new users, guest checkout, or long-tail merchants
Success Criteria
A good solution should:
- achieve recall e 85% on fraudulent transactions,
- maintain precision e 20% at the chosen operating threshold,
- and deliver p95 inference latency under 50 ms per transaction in batch or online scoring.
Constraints
- False negatives are expensive because missed fraud leads to chargebacks and customer support costs.
- False positives are also costly because they block legitimate purchases and hurt conversion.
- The model must be explainable enough for analysts to review flagged transactions.
- Training data is time-dependent, so leakage from future behavior must be avoided.
Deliverables
- Propose a classification approach, including baseline and primary model.
- Describe preprocessing and feature engineering for mixed numeric/categorical data.
- Define a leakage-safe train/validation/test strategy.
- Choose evaluation metrics and thresholding logic for imbalanced fraud data.
- Provide production-ready Python code to train, evaluate, and score the model.
- Explain deployment, monitoring, and retraining recommendations.