Fraud Detection Batch vs Streaming

Context

PayWave, a digital payments platform processing card-not-present transactions, currently runs hourly batch fraud scoring with Apache Spark on S3-backed transaction logs. Fraud analysts want sub-second blocking for high-risk payments, but the finance team still needs complete, reconciled datasets for investigations, chargebacks, and model retraining.

You need to design a fraud data pipeline and explain the trade-offs between batch and stream processing, including whether to use one approach or a hybrid architecture.

Scale Requirements

Transaction volume: 120K transactions/second peak, 25K average
Event size: ~1.5 KB JSON per authorization event
Daily data volume: ~8 TB raw transaction + device + merchant events
Decision latency: P95 < 300 ms for online fraud decisions
Batch SLA: Reconciled fraud fact tables available within 30 minutes of hour close
Retention: 13 months hot storage, 7 years archived for compliance

Requirements

Design ingestion for real-time transaction, device fingerprint, and merchant risk events.
Support online fraud scoring for payment authorization decisions with low latency.
Build batch reconciliation to correct late, duplicated, or out-of-order events and produce investigation-ready tables.
Define how features such as card velocity, merchant anomaly counts, and device reuse are computed in streaming vs batch.
Ensure idempotent processing, replay capability, and auditable lineage from raw event to fraud decision.
Describe orchestration, monitoring, and failure recovery for both real-time and batch paths.
Explain the trade-offs in accuracy, latency, cost, operational complexity, and recovery when choosing batch, streaming, or hybrid.

Constraints

Existing stack is AWS-first: MSK, S3, EMR, Airflow, Snowflake
Incremental budget is capped at $40K/month
PCI and SOC 2 controls apply; PII must be encrypted in transit and at rest
Team has strong Spark/Airflow experience but limited expertise operating low-latency stateful streaming at scale

Context

You need to design a fraud data pipeline and explain the trade-offs between batch and stream processing, including whether to use one approach or a hybrid architecture.

Scale Requirements

Transaction volume: 120K transactions/second peak, 25K average
Event size: ~1.5 KB JSON per authorization event
Daily data volume: ~8 TB raw transaction + device + merchant events
Decision latency: P95 < 300 ms for online fraud decisions
Batch SLA: Reconciled fraud fact tables available within 30 minutes of hour close
Retention: 13 months hot storage, 7 years archived for compliance

Requirements

Design ingestion for real-time transaction, device fingerprint, and merchant risk events.
Support online fraud scoring for payment authorization decisions with low latency.
Build batch reconciliation to correct late, duplicated, or out-of-order events and produce investigation-ready tables.
Define how features such as card velocity, merchant anomaly counts, and device reuse are computed in streaming vs batch.
Ensure idempotent processing, replay capability, and auditable lineage from raw event to fraud decision.
Describe orchestration, monitoring, and failure recovery for both real-time and batch paths.
Explain the trade-offs in accuracy, latency, cost, operational complexity, and recovery when choosing batch, streaming, or hybrid.

Constraints

Existing stack is AWS-first: MSK, S3, EMR, Airflow, Snowflake
Incremental budget is capped at $40K/month
PCI and SOC 2 controls apply; PII must be encrypted in transit and at rest
Team has strong Spark/Airflow experience but limited expertise operating low-latency stateful streaming at scale

Context

You need to design a fraud data pipeline and explain the trade-offs between batch and stream processing, including whether to use one approach or a hybrid architecture.

Scale Requirements

Transaction volume: 120K transactions/second peak, 25K average
Event size: ~1.5 KB JSON per authorization event
Daily data volume: ~8 TB raw transaction + device + merchant events
Decision latency: P95 < 300 ms for online fraud decisions
Batch SLA: Reconciled fraud fact tables available within 30 minutes of hour close
Retention: 13 months hot storage, 7 years archived for compliance

Requirements

Design ingestion for real-time transaction, device fingerprint, and merchant risk events.
Support online fraud scoring for payment authorization decisions with low latency.
Build batch reconciliation to correct late, duplicated, or out-of-order events and produce investigation-ready tables.
Define how features such as card velocity, merchant anomaly counts, and device reuse are computed in streaming vs batch.
Ensure idempotent processing, replay capability, and auditable lineage from raw event to fraud decision.
Describe orchestration, monitoring, and failure recovery for both real-time and batch paths.
Explain the trade-offs in accuracy, latency, cost, operational complexity, and recovery when choosing batch, streaming, or hybrid.

Constraints

Existing stack is AWS-first: MSK, S3, EMR, Airflow, Snowflake
Incremental budget is capped at $40K/month
PCI and SOC 2 controls apply; PII must be encrypted in transit and at rest
Team has strong Spark/Airflow experience but limited expertise operating low-latency stateful streaming at scale

Context

You need to design a fraud data pipeline and explain the trade-offs between batch and stream processing, including whether to use one approach or a hybrid architecture.

Scale Requirements

Transaction volume: 120K transactions/second peak, 25K average
Event size: ~1.5 KB JSON per authorization event
Daily data volume: ~8 TB raw transaction + device + merchant events
Decision latency: P95 < 300 ms for online fraud decisions
Batch SLA: Reconciled fraud fact tables available within 30 minutes of hour close
Retention: 13 months hot storage, 7 years archived for compliance

Requirements

Design ingestion for real-time transaction, device fingerprint, and merchant risk events.
Support online fraud scoring for payment authorization decisions with low latency.
Build batch reconciliation to correct late, duplicated, or out-of-order events and produce investigation-ready tables.
Define how features such as card velocity, merchant anomaly counts, and device reuse are computed in streaming vs batch.
Ensure idempotent processing, replay capability, and auditable lineage from raw event to fraud decision.
Describe orchestration, monitoring, and failure recovery for both real-time and batch paths.
Explain the trade-offs in accuracy, latency, cost, operational complexity, and recovery when choosing batch, streaming, or hybrid.

Constraints

Existing stack is AWS-first: MSK, S3, EMR, Airflow, Snowflake
Incremental budget is capped at $40K/month
PCI and SOC 2 controls apply; PII must be encrypted in transit and at rest
Team has strong Spark/Airflow experience but limited expertise operating low-latency stateful streaming at scale

Interview Guides

Context

Scale Requirements

Requirements

Constraints

Fraud Detection Batch vs Streaming

Context

Scale Requirements

Requirements

Constraints

Your Answer

Fraud Detection Batch vs Streaming

Context

Scale Requirements

Requirements

Constraints

Fraud Detection Batch vs Streaming

Context

Scale Requirements

Requirements

Constraints

Your Answer