Design Real-Time Transaction Fraud Detection

Product Context

Voya Financial wants to score incoming financial transactions for fraud across retirement disbursements, account transfers, and linked payment activity in Voya's digital servicing surfaces. The system is used by members, contact-center agents, and risk operations teams, and must decide in real time whether to approve, step up authentication, hold for review, or decline.

Scale

Signal	Value
Registered members	9M
Monthly active digital users	3.5M
Transactions/day	18M
Peak transaction QPS	2,500
Peak feature lookups QPS	25,000+
Historical labeled transactions	4.2B over 3 years
Fraud rate	~0.18% of transactions
End-to-end decision latency budget (p99)	120ms

Fraud labels are delayed and noisy: some chargebacks or confirmed fraud cases arrive days later, while many transactions have no explicit negative label. The business cares about reducing fraud loss without causing excessive false positives that block legitimate retirement and benefits activity.

Task

Design an end-to-end ML system for real-time fraud detection at Voya Financial. Address the following:

Clarify the product requirements, decision actions, and key business tradeoffs between fraud capture and customer friction.
Propose a multi-stage architecture for real-time scoring, including fast candidate/rule gating, ML ranking/scoring, and a final policy or re-ranking layer for actioning.
Design the offline and online data architecture: feature computation, feature store, training cadence, label generation, and feedback loop.
Choose models for each stage and justify them under class imbalance, delayed labels, and strict latency constraints.
Define offline evaluation, online rollout, monitoring, and alerting, including calibration, drift, and training-serving skew.
Identify major failure modes and how the system should fail safely under outages or degraded dependencies.

Constraints

Must support hard compliance and audit requirements: every decision must be explainable and reproducible.
PII usage is restricted; sensitive features require governed access and lineage.
Some features must be updated in near real time (device velocity, recent transfer count, IP risk), while others can be batch refreshed daily.
False positives are expensive: blocking a legitimate retirement withdrawal or rollover creates customer and regulatory risk.
The system must remain available during feature store or model service degradation, with deterministic fallback behavior.

Product Context

Scale

Signal	Value
Registered members	9M
Monthly active digital users	3.5M
Transactions/day	18M
Peak transaction QPS	2,500
Peak feature lookups QPS	25,000+
Historical labeled transactions	4.2B over 3 years
Fraud rate	~0.18% of transactions
End-to-end decision latency budget (p99)	120ms

Task

Design an end-to-end ML system for real-time fraud detection at Voya Financial. Address the following:

Clarify the product requirements, decision actions, and key business tradeoffs between fraud capture and customer friction.
Propose a multi-stage architecture for real-time scoring, including fast candidate/rule gating, ML ranking/scoring, and a final policy or re-ranking layer for actioning.
Design the offline and online data architecture: feature computation, feature store, training cadence, label generation, and feedback loop.
Choose models for each stage and justify them under class imbalance, delayed labels, and strict latency constraints.
Define offline evaluation, online rollout, monitoring, and alerting, including calibration, drift, and training-serving skew.
Identify major failure modes and how the system should fail safely under outages or degraded dependencies.

Constraints

Must support hard compliance and audit requirements: every decision must be explainable and reproducible.
PII usage is restricted; sensitive features require governed access and lineage.
Some features must be updated in near real time (device velocity, recent transfer count, IP risk), while others can be batch refreshed daily.
False positives are expensive: blocking a legitimate retirement withdrawal or rollover creates customer and regulatory risk.
The system must remain available during feature store or model service degradation, with deterministic fallback behavior.

Product Context

Scale

Signal	Value
Registered members	9M
Monthly active digital users	3.5M
Transactions/day	18M
Peak transaction QPS	2,500
Peak feature lookups QPS	25,000+
Historical labeled transactions	4.2B over 3 years
Fraud rate	~0.18% of transactions
End-to-end decision latency budget (p99)	120ms

Task

Design an end-to-end ML system for real-time fraud detection at Voya Financial. Address the following:

Clarify the product requirements, decision actions, and key business tradeoffs between fraud capture and customer friction.
Propose a multi-stage architecture for real-time scoring, including fast candidate/rule gating, ML ranking/scoring, and a final policy or re-ranking layer for actioning.
Design the offline and online data architecture: feature computation, feature store, training cadence, label generation, and feedback loop.
Choose models for each stage and justify them under class imbalance, delayed labels, and strict latency constraints.
Define offline evaluation, online rollout, monitoring, and alerting, including calibration, drift, and training-serving skew.
Identify major failure modes and how the system should fail safely under outages or degraded dependencies.

Constraints

Must support hard compliance and audit requirements: every decision must be explainable and reproducible.
PII usage is restricted; sensitive features require governed access and lineage.
Some features must be updated in near real time (device velocity, recent transfer count, IP risk), while others can be batch refreshed daily.
False positives are expensive: blocking a legitimate retirement withdrawal or rollover creates customer and regulatory risk.
The system must remain available during feature store or model service degradation, with deterministic fallback behavior.

Product Context

Scale

Signal	Value
Registered members	9M
Monthly active digital users	3.5M
Transactions/day	18M
Peak transaction QPS	2,500
Peak feature lookups QPS	25,000+
Historical labeled transactions	4.2B over 3 years
Fraud rate	~0.18% of transactions
End-to-end decision latency budget (p99)	120ms

Task

Design an end-to-end ML system for real-time fraud detection at Voya Financial. Address the following:

Clarify the product requirements, decision actions, and key business tradeoffs between fraud capture and customer friction.
Propose a multi-stage architecture for real-time scoring, including fast candidate/rule gating, ML ranking/scoring, and a final policy or re-ranking layer for actioning.
Design the offline and online data architecture: feature computation, feature store, training cadence, label generation, and feedback loop.
Choose models for each stage and justify them under class imbalance, delayed labels, and strict latency constraints.
Define offline evaluation, online rollout, monitoring, and alerting, including calibration, drift, and training-serving skew.
Identify major failure modes and how the system should fail safely under outages or degraded dependencies.

Constraints

Must support hard compliance and audit requirements: every decision must be explainable and reproducible.
PII usage is restricted; sensitive features require governed access and lineage.
Some features must be updated in near real time (device velocity, recent transfer count, IP risk), while others can be batch refreshed daily.
False positives are expensive: blocking a legitimate retirement withdrawal or rollover creates customer and regulatory risk.
The system must remain available during feature store or model service degradation, with deterministic fallback behavior.

Interview Guides

Product Context

Scale

Task

Constraints

Design Real-Time Transaction Fraud Detection

Product Context

Scale

Task

Constraints

Your Answer

Design Real-Time Transaction Fraud Detection

Product Context

Scale

Task

Constraints

Design Real-Time Transaction Fraud Detection

Product Context

Scale

Task

Constraints

Your Answer