You are designing a real-time ML system for a digital payments platform that scores every card transaction for fraud before authorization. The score is used to approve, decline, or step up transactions, so the model directly affects both fraud losses and customer conversion. Fraud patterns shift quickly, labels are delayed by chargebacks and investigations, and the business wants decisions to incorporate the latest user and merchant behavior. The system must support low-latency predictions globally while remaining robust to drift, outages, and feature inconsistencies.
| Signal | Value |
|---|---|
| Daily active cardholders | 18M |
| Peak transaction scoring QPS | 45K |
| Average transaction scoring QPS | 18K |
| Distinct merchants | 9M |
| User + merchant feature lookups per request | 40-80 |
| End-to-end decision latency budget (p99) | 120ms |
| Fraud label delay | 7-45 days |
How would you design this end-to-end system so it can make accurate real-time predictions at this scale while handling delayed labels, feature freshness, training-serving skew, and operational failures?