Product Context
PayShield is a global card issuer and payment processor. Every card authorization request must be scored for fraud risk before the issuer approves, declines, or routes the transaction for step-up verification.
Scale
| Signal | Value |
|---|
| Active cardholders | 45M |
| Daily active cards | 12M |
| Peak authorization QPS | 35K |
| Average authorization QPS | 12K |
| Transactions per day | ~1B/month (~33M/day) |
| Merchant locations / terminals | 18M |
| Latency budget (p99, end-to-end) | 200ms |
| Chargeback / confirmed fraud delay | 2-45 days |
Task
Design an end-to-end ML system to detect credit card fraud in under 200ms. Your design should address:
- How you would define the prediction target, actions, and decision thresholds for approve / decline / review / step-up authentication
- The full architecture from data ingestion and feature computation to online scoring and feedback loops
- Whether you would use a multi-stage system (for example, rules/filtering lightweight model heavier model / policy layer), and how each stage fits the latency budget
- Model choices, feature store design, and how to handle delayed labels, class imbalance, and concept drift
- Offline evaluation, online experimentation, and how you would monitor calibration, false positives, and business impact
- Top failure modes, including training-serving skew, feature drift, outages, and adversarial adaptation
Constraints
- The system must return a decision within 200ms p99, including network overhead and feature lookups
- False positives are expensive: unnecessary declines hurt customer trust and revenue
- Fraud labels are delayed and partially observed; many events are only weakly labeled at authorization time
- The system must satisfy PCI/compliance constraints and minimize use of raw PII in training and serving
- The fraud strategy team needs interpretable reason codes for declines and step-up actions
- Traffic patterns shift sharply during holidays, merchant outages, and coordinated fraud attacks