Business Context
You’re interviewing for a Senior ML Engineer role at SwiftPay, a global payment processor that handles ~35M card-not-present (CNP) transactions/day across North America and Europe. Fraud losses are trending upward after a product launch that reduced checkout friction. Each basis-point change in fraud rate moves $8–12M/year in chargebacks and operational costs. The fraud team needs a model that can score transactions in real time to decide whether to approve, decline, or step-up authenticate.
SwiftPay already has a strong gradient-boosted tree baseline, but leadership wants to understand whether a neural network can improve AUC-PR and recall at low FPR while maintaining tight latency and explainability requirements.
This question is intentionally framed like: “Discuss the architecture of a neural network you recently built. Why did you choose that specific topology?”—but grounded in a realistic production setting.
Dataset
You are given a curated training set built from the last 90 days of traffic.
| Feature Group | Count | Examples | Notes |
|---|
| Transaction attributes | 18 | amount, currency, merchant_category, is_recurring, channel | Heavy-tailed amounts; some leakage-prone fields removed |
| Customer behavior aggregates | 22 | txns_1h, txns_24h, spend_7d, distinct_merchants_30d | Computed from event streams; must be reproducible online |
| Device / network | 14 | device_fingerprint_hash, ip_asn, ip_country, user_agent_family | High-cardinality categoricals |
| Merchant risk signals | 9 | merchant_chargeback_rate_30d, merchant_age_days | Slowly changing |
| Temporal | 6 | hour_of_day, day_of_week, holiday_flag | Strong seasonality |
- Size: ~120M labeled transactions, 69 features
- Target: Binary — fraud_chargeback_within_60d (1) vs non-fraud (0)
- Class balance: 0.35% positive (highly imbalanced)
- Missing data: ~8% missing in device/network features (ad blockers, privacy settings), ~3% missing in aggregates for new customers
Success Criteria
You are optimizing for business outcomes and operational constraints:
- Recall ≥ 70% at FPR ≤ 0.20% on a time-based holdout week (fraud ops can only review a small fraction).
- AUC-PR ≥ 0.20 (baseline is ~0.16 on the same split).
- p95 inference latency ≤ 15 ms per transaction on CPU (single request), including preprocessing.
- Calibrated probabilities: Expected Calibration Error (ECE) ≤ 0.02 so risk thresholds are stable.
Constraints
- No data leakage: splits must be time-based; aggregates must be computed only from past data.
- Regulatory/compliance: must provide a reason code style explanation (top contributing features) for declines.
- Feature store parity: training features must match online features exactly.
- Model size: must fit in memory for a low-latency service (target < 50 MB).
Deliverables (what you must produce in the interview)
- Propose a neural network architecture/topology for this tabular + high-cardinality categorical problem (e.g., embeddings + MLP, wide&deep, DeepFM-style interactions, residual MLP).
- Explain why you chose that topology over alternatives (GBDT, logistic regression, pure MLP, transformer for tabular, etc.), explicitly tying choices to:
- imbalance and label noise (chargebacks)
- feature types (numerical + categorical + aggregates)
- latency and memory constraints
- calibration and thresholding needs
- Describe your training recipe: loss function, sampling strategy, regularization, early stopping, and hyperparameter tuning.
- Define your evaluation plan: metrics, time splits, and how you would pick an operating threshold.
- Explain how you would make the model explainable enough for compliance (e.g., SHAP on a surrogate, integrated gradients, monotonic constraints via feature engineering, reason-code mapping).