PayLink, a digital payments platform processing 12M transactions per month, wants to improve account-level fraud detection. In addition to standard tabular features, the risk team believes relationships between users, devices, cards, merchants, and IPs contain useful signal that is not captured by flat features alone.
You are given a heterogeneous graph built from 6 months of payment activity. Nodes represent accounts, devices, cards, merchants, emails, phone numbers, and IP addresses. Edges represent relationships such as used_device, used_card, transacted_with, logged_in_from_ip, and shares_phone.
| Component | Size | Examples |
|---|---|---|
| Account nodes | 420K | account_age_days, kyc_level, country |
| Entity nodes | 1.8M | device_id, hashed_email, card_bin, merchant_id, IP |
| Edges | 9.6M | account-device, account-card, account-IP, account-merchant |
| Labels | 420K accounts | fraud_reported_30d (binary) |
A good solution should show that graph-derived embeddings or encodings improve fraud ranking over a tabular-only baseline. Target at least AUC-ROC > 0.88, PR-AUC > 0.30, and recall > 0.70 at precision >= 0.25.