Business Context
PayLink processes roughly 12 million card transactions per day for mid-market e-commerce merchants. The fraud operations team needs a model that identifies extremely rare fraudulent transactions in near real time, where the positive class represents only 0.1% of all labeled examples.
Dataset
You are given a historical transaction dataset for supervised binary classification.
| Feature Group | Count | Examples |
|---|
| Transaction attributes | 14 | amount, currency, merchant_category, payment_method, device_type |
| User behavior | 11 | transactions_1h, avg_amount_7d, failed_attempts_24h, account_age_days |
| Risk signals | 9 | ip_country_mismatch, velocity_score, email_domain_risk, prior_chargebacks |
| Temporal/context | 8 | hour_of_day, day_of_week, holiday_flag, merchant_region |
- Size: 8.4M transactions, 42 engineered and raw features
- Target:
is_fraud (1 = confirmed fraud, 0 = legitimate)
- Class balance: 0.1% positive, 99.9% negative
- Missing data: 6% missing in device fingerprint fields, 18% missing in historical behavior features for new users
Success Criteria
A good solution should:
- achieve recall e 75% on fraudulent transactions,
- maintain precision e 10% at the operating threshold,
- improve analyst efficiency with lift > 20x in the top 0.5% scored transactions.
Constraints
- Online inference latency must stay under 50 ms per transaction.
- The fraud team needs feature-level explanations for flagged transactions.
- False positives are costly because they block legitimate payments.
- Training can run daily; scoring must support real-time serving.
Deliverables
- Propose a modeling approach for extreme class imbalance (0.1% positive rate).
- Describe preprocessing, feature engineering, and leakage prevention.
- Train and evaluate a baseline and a stronger production candidate.
- Choose decision thresholds based on business tradeoffs, not accuracy.
- Explain how you would monitor precision, recall, drift, and calibration after deployment.