Neural Network Topology for Card Fraud

Business Context

You’re interviewing for a Senior ML Engineer role at SwiftPay, a global payment processor that handles ~35M card-not-present (CNP) transactions/day across North America and Europe. Fraud losses are trending upward after a product launch that reduced checkout friction. Each basis-point change in fraud rate moves $8–12M/year in chargebacks and operational costs. The fraud team needs a model that can score transactions in real time to decide whether to approve, decline, or step-up authenticate.

SwiftPay already has a strong gradient-boosted tree baseline, but leadership wants to understand whether a neural network can improve AUC-PR and recall at low FPR while maintaining tight latency and explainability requirements.

This question is intentionally framed like: “Discuss the architecture of a neural network you recently built. Why did you choose that specific topology?”—but grounded in a realistic production setting.

Dataset

You are given a curated training set built from the last 90 days of traffic.

Feature Group	Count	Examples	Notes
Transaction attributes	18	amount, currency, merchant_category, is_recurring, channel	Heavy-tailed amounts; some leakage-prone fields removed
Customer behavior aggregates	22	txns_1h, txns_24h, spend_7d, distinct_merchants_30d	Computed from event streams; must be reproducible online
Device / network	14	device_fingerprint_hash, ip_asn, ip_country, user_agent_family	High-cardinality categoricals
Merchant risk signals	9	merchant_chargeback_rate_30d, merchant_age_days	Slowly changing
Temporal	6	hour_of_day, day_of_week, holiday_flag	Strong seasonality

Size: ~120M labeled transactions, 69 features
Target: Binary — fraud_chargeback_within_60d (1) vs non-fraud (0)
Class balance: 0.35% positive (highly imbalanced)
Missing data: ~8% missing in device/network features (ad blockers, privacy settings), ~3% missing in aggregates for new customers

Success Criteria

You are optimizing for business outcomes and operational constraints:

Recall ≥ 70% at FPR ≤ 0.20% on a time-based holdout week (fraud ops can only review a small fraction).
AUC-PR ≥ 0.20 (baseline is ~0.16 on the same split).
p95 inference latency ≤ 15 ms per transaction on CPU (single request), including preprocessing.
Calibrated probabilities: Expected Calibration Error (ECE) ≤ 0.02 so risk thresholds are stable.

Constraints

No data leakage: splits must be time-based; aggregates must be computed only from past data.
Regulatory/compliance: must provide a reason code style explanation (top contributing features) for declines.
Feature store parity: training features must match online features exactly.
Model size: must fit in memory for a low-latency service (target < 50 MB).

Deliverables (what you must produce in the interview)

Propose a neural network architecture/topology for this tabular + high-cardinality categorical problem (e.g., embeddings + MLP, wide&deep, DeepFM-style interactions, residual MLP).
Explain why you chose that topology over alternatives (GBDT, logistic regression, pure MLP, transformer for tabular, etc.), explicitly tying choices to:
- imbalance and label noise (chargebacks)
- feature types (numerical + categorical + aggregates)
- latency and memory constraints
- calibration and thresholding needs
Describe your training recipe: loss function, sampling strategy, regularization, early stopping, and hyperparameter tuning.
Define your evaluation plan: metrics, time splits, and how you would pick an operating threshold.
Explain how you would make the model explainable enough for compliance (e.g., SHAP on a surrogate, integrated gradients, monotonic constraints via feature engineering, reason-code mapping).

Business Context

Dataset

You are given a curated training set built from the last 90 days of traffic.

Feature Group	Count	Examples	Notes
Transaction attributes	18	amount, currency, merchant_category, is_recurring, channel	Heavy-tailed amounts; some leakage-prone fields removed
Customer behavior aggregates	22	txns_1h, txns_24h, spend_7d, distinct_merchants_30d	Computed from event streams; must be reproducible online
Device / network	14	device_fingerprint_hash, ip_asn, ip_country, user_agent_family	High-cardinality categoricals
Merchant risk signals	9	merchant_chargeback_rate_30d, merchant_age_days	Slowly changing
Temporal	6	hour_of_day, day_of_week, holiday_flag	Strong seasonality

Size: ~120M labeled transactions, 69 features
Target: Binary — fraud_chargeback_within_60d (1) vs non-fraud (0)
Class balance: 0.35% positive (highly imbalanced)
Missing data: ~8% missing in device/network features (ad blockers, privacy settings), ~3% missing in aggregates for new customers

Success Criteria

You are optimizing for business outcomes and operational constraints:

Recall ≥ 70% at FPR ≤ 0.20% on a time-based holdout week (fraud ops can only review a small fraction).
AUC-PR ≥ 0.20 (baseline is ~0.16 on the same split).
p95 inference latency ≤ 15 ms per transaction on CPU (single request), including preprocessing.
Calibrated probabilities: Expected Calibration Error (ECE) ≤ 0.02 so risk thresholds are stable.

Constraints

No data leakage: splits must be time-based; aggregates must be computed only from past data.
Regulatory/compliance: must provide a reason code style explanation (top contributing features) for declines.
Feature store parity: training features must match online features exactly.
Model size: must fit in memory for a low-latency service (target < 50 MB).

Deliverables (what you must produce in the interview)

Propose a neural network architecture/topology for this tabular + high-cardinality categorical problem (e.g., embeddings + MLP, wide&deep, DeepFM-style interactions, residual MLP).
Explain why you chose that topology over alternatives (GBDT, logistic regression, pure MLP, transformer for tabular, etc.), explicitly tying choices to:
- imbalance and label noise (chargebacks)
- feature types (numerical + categorical + aggregates)
- latency and memory constraints
- calibration and thresholding needs
Describe your training recipe: loss function, sampling strategy, regularization, early stopping, and hyperparameter tuning.
Define your evaluation plan: metrics, time splits, and how you would pick an operating threshold.
Explain how you would make the model explainable enough for compliance (e.g., SHAP on a surrogate, integrated gradients, monotonic constraints via feature engineering, reason-code mapping).

Business Context

Dataset

You are given a curated training set built from the last 90 days of traffic.

Feature Group	Count	Examples	Notes
Transaction attributes	18	amount, currency, merchant_category, is_recurring, channel	Heavy-tailed amounts; some leakage-prone fields removed
Customer behavior aggregates	22	txns_1h, txns_24h, spend_7d, distinct_merchants_30d	Computed from event streams; must be reproducible online
Device / network	14	device_fingerprint_hash, ip_asn, ip_country, user_agent_family	High-cardinality categoricals
Merchant risk signals	9	merchant_chargeback_rate_30d, merchant_age_days	Slowly changing
Temporal	6	hour_of_day, day_of_week, holiday_flag	Strong seasonality

Size: ~120M labeled transactions, 69 features
Target: Binary — fraud_chargeback_within_60d (1) vs non-fraud (0)
Class balance: 0.35% positive (highly imbalanced)
Missing data: ~8% missing in device/network features (ad blockers, privacy settings), ~3% missing in aggregates for new customers

Success Criteria

You are optimizing for business outcomes and operational constraints:

Recall ≥ 70% at FPR ≤ 0.20% on a time-based holdout week (fraud ops can only review a small fraction).
AUC-PR ≥ 0.20 (baseline is ~0.16 on the same split).
p95 inference latency ≤ 15 ms per transaction on CPU (single request), including preprocessing.
Calibrated probabilities: Expected Calibration Error (ECE) ≤ 0.02 so risk thresholds are stable.

Constraints

No data leakage: splits must be time-based; aggregates must be computed only from past data.
Regulatory/compliance: must provide a reason code style explanation (top contributing features) for declines.
Feature store parity: training features must match online features exactly.
Model size: must fit in memory for a low-latency service (target < 50 MB).

Deliverables (what you must produce in the interview)

Propose a neural network architecture/topology for this tabular + high-cardinality categorical problem (e.g., embeddings + MLP, wide&deep, DeepFM-style interactions, residual MLP).
Explain why you chose that topology over alternatives (GBDT, logistic regression, pure MLP, transformer for tabular, etc.), explicitly tying choices to:
- imbalance and label noise (chargebacks)
- feature types (numerical + categorical + aggregates)
- latency and memory constraints
- calibration and thresholding needs
Describe your training recipe: loss function, sampling strategy, regularization, early stopping, and hyperparameter tuning.
Define your evaluation plan: metrics, time splits, and how you would pick an operating threshold.
Explain how you would make the model explainable enough for compliance (e.g., SHAP on a surrogate, integrated gradients, monotonic constraints via feature engineering, reason-code mapping).

Business Context

Dataset

You are given a curated training set built from the last 90 days of traffic.

Feature Group	Count	Examples	Notes
Transaction attributes	18	amount, currency, merchant_category, is_recurring, channel	Heavy-tailed amounts; some leakage-prone fields removed
Customer behavior aggregates	22	txns_1h, txns_24h, spend_7d, distinct_merchants_30d	Computed from event streams; must be reproducible online
Device / network	14	device_fingerprint_hash, ip_asn, ip_country, user_agent_family	High-cardinality categoricals
Merchant risk signals	9	merchant_chargeback_rate_30d, merchant_age_days	Slowly changing
Temporal	6	hour_of_day, day_of_week, holiday_flag	Strong seasonality

Size: ~120M labeled transactions, 69 features
Target: Binary — fraud_chargeback_within_60d (1) vs non-fraud (0)
Class balance: 0.35% positive (highly imbalanced)
Missing data: ~8% missing in device/network features (ad blockers, privacy settings), ~3% missing in aggregates for new customers

Success Criteria

You are optimizing for business outcomes and operational constraints:

Recall ≥ 70% at FPR ≤ 0.20% on a time-based holdout week (fraud ops can only review a small fraction).
AUC-PR ≥ 0.20 (baseline is ~0.16 on the same split).
p95 inference latency ≤ 15 ms per transaction on CPU (single request), including preprocessing.
Calibrated probabilities: Expected Calibration Error (ECE) ≤ 0.02 so risk thresholds are stable.

Constraints

No data leakage: splits must be time-based; aggregates must be computed only from past data.
Regulatory/compliance: must provide a reason code style explanation (top contributing features) for declines.
Feature store parity: training features must match online features exactly.
Model size: must fit in memory for a low-latency service (target < 50 MB).

Deliverables (what you must produce in the interview)

Propose a neural network architecture/topology for this tabular + high-cardinality categorical problem (e.g., embeddings + MLP, wide&deep, DeepFM-style interactions, residual MLP).
Explain why you chose that topology over alternatives (GBDT, logistic regression, pure MLP, transformer for tabular, etc.), explicitly tying choices to:
- imbalance and label noise (chargebacks)
- feature types (numerical + categorical + aggregates)
- latency and memory constraints
- calibration and thresholding needs
Describe your training recipe: loss function, sampling strategy, regularization, early stopping, and hyperparameter tuning.
Define your evaluation plan: metrics, time splits, and how you would pick an operating threshold.
Explain how you would make the model explainable enough for compliance (e.g., SHAP on a surrogate, integrated gradients, monotonic constraints via feature engineering, reason-code mapping).

Interview Guides

Business Context

Dataset

Success Criteria

Constraints

Deliverables (what you must produce in the interview)

Neural Network Topology for Card Fraud

Business Context

Dataset

Success Criteria

Constraints

Deliverables (what you must produce in the interview)

Your Answer

Neural Network Topology for Card Fraud

Business Context

Dataset

Success Criteria

Constraints

Deliverables (what you must produce in the interview)

Neural Network Topology for Card Fraud

Business Context

Dataset

Success Criteria

Constraints

Deliverables (what you must produce in the interview)

Your Answer