Detect Card Fraud with Imbalanced Data

Business Context

PayFlow processes roughly 12 million card transactions per day across web and mobile checkout. Fraud losses are rising, and the risk team needs a model that flags suspicious transactions in near real time without overwhelming manual reviewers or blocking too many legitimate payments.

Dataset

You are given a historical transaction dataset built from the last 9 months of activity.

Feature Group	Count	Examples
Transaction attributes	12	amount, merchant_category, payment_method, currency, hour_of_day
Customer behavior	10	transactions_24h, avg_amount_30d, chargebacks_90d, device_count_7d
Device / network	8	device_id_hash, ip_country, vpn_flag, browser_family
Merchant signals	6	merchant_risk_score, refund_rate_30d, dispute_rate_90d
Derived velocity features	9	amount_zscore_user, cards_per_device_24h, failed_attempts_1h

Rows: 4.8M transactions, 45 features
Target: is_fraud (1 = confirmed fraud/chargeback, 0 = legitimate)
Class balance: 0.42% fraud, 99.58% non-fraud
Missing data: 18% missing in merchant risk features for new merchants, 6% missing in device attributes, sparse high-cardinality categoricals

Success Criteria

A strong solution should:

achieve recall >= 75% on fraud cases,
maintain precision >= 12% at the operating threshold,
deliver PR-AUC >= 0.30 on the held-out test set,
score each transaction in under 50 ms p95 for online inference.

Constraints

False positives directly impact checkout conversion and customer trust.
The fraud team can manually review at most 8,000 alerts/day.
The solution must be explainable enough to support analyst review and model governance.
Training can run offline daily; inference must support real-time API scoring.

Deliverables

Propose a modeling approach for severe class imbalance.
Build a training pipeline with preprocessing, feature handling, and threshold selection.
Justify the evaluation strategy and why accuracy is not appropriate.
Show how you would tune for recall/precision tradeoffs under review-capacity constraints.
Describe how the model would be deployed, monitored, and retrained in production.

Business Context

Dataset

You are given a historical transaction dataset built from the last 9 months of activity.

Feature Group	Count	Examples
Transaction attributes	12	amount, merchant_category, payment_method, currency, hour_of_day
Customer behavior	10	transactions_24h, avg_amount_30d, chargebacks_90d, device_count_7d
Device / network	8	device_id_hash, ip_country, vpn_flag, browser_family
Merchant signals	6	merchant_risk_score, refund_rate_30d, dispute_rate_90d
Derived velocity features	9	amount_zscore_user, cards_per_device_24h, failed_attempts_1h

Rows: 4.8M transactions, 45 features
Target: is_fraud (1 = confirmed fraud/chargeback, 0 = legitimate)
Class balance: 0.42% fraud, 99.58% non-fraud
Missing data: 18% missing in merchant risk features for new merchants, 6% missing in device attributes, sparse high-cardinality categoricals

Success Criteria

A strong solution should:

achieve recall >= 75% on fraud cases,
maintain precision >= 12% at the operating threshold,
deliver PR-AUC >= 0.30 on the held-out test set,
score each transaction in under 50 ms p95 for online inference.

Constraints

False positives directly impact checkout conversion and customer trust.
The fraud team can manually review at most 8,000 alerts/day.
The solution must be explainable enough to support analyst review and model governance.
Training can run offline daily; inference must support real-time API scoring.

Deliverables

Propose a modeling approach for severe class imbalance.
Build a training pipeline with preprocessing, feature handling, and threshold selection.
Justify the evaluation strategy and why accuracy is not appropriate.
Show how you would tune for recall/precision tradeoffs under review-capacity constraints.
Describe how the model would be deployed, monitored, and retrained in production.

Business Context

Dataset

You are given a historical transaction dataset built from the last 9 months of activity.

Feature Group	Count	Examples
Transaction attributes	12	amount, merchant_category, payment_method, currency, hour_of_day
Customer behavior	10	transactions_24h, avg_amount_30d, chargebacks_90d, device_count_7d
Device / network	8	device_id_hash, ip_country, vpn_flag, browser_family
Merchant signals	6	merchant_risk_score, refund_rate_30d, dispute_rate_90d
Derived velocity features	9	amount_zscore_user, cards_per_device_24h, failed_attempts_1h

Rows: 4.8M transactions, 45 features
Target: is_fraud (1 = confirmed fraud/chargeback, 0 = legitimate)
Class balance: 0.42% fraud, 99.58% non-fraud
Missing data: 18% missing in merchant risk features for new merchants, 6% missing in device attributes, sparse high-cardinality categoricals

Success Criteria

A strong solution should:

achieve recall >= 75% on fraud cases,
maintain precision >= 12% at the operating threshold,
deliver PR-AUC >= 0.30 on the held-out test set,
score each transaction in under 50 ms p95 for online inference.

Constraints

False positives directly impact checkout conversion and customer trust.
The fraud team can manually review at most 8,000 alerts/day.
The solution must be explainable enough to support analyst review and model governance.
Training can run offline daily; inference must support real-time API scoring.

Deliverables

Propose a modeling approach for severe class imbalance.
Build a training pipeline with preprocessing, feature handling, and threshold selection.
Justify the evaluation strategy and why accuracy is not appropriate.
Show how you would tune for recall/precision tradeoffs under review-capacity constraints.
Describe how the model would be deployed, monitored, and retrained in production.

Business Context

Dataset

You are given a historical transaction dataset built from the last 9 months of activity.

Feature Group	Count	Examples
Transaction attributes	12	amount, merchant_category, payment_method, currency, hour_of_day
Customer behavior	10	transactions_24h, avg_amount_30d, chargebacks_90d, device_count_7d
Device / network	8	device_id_hash, ip_country, vpn_flag, browser_family
Merchant signals	6	merchant_risk_score, refund_rate_30d, dispute_rate_90d
Derived velocity features	9	amount_zscore_user, cards_per_device_24h, failed_attempts_1h

Rows: 4.8M transactions, 45 features
Target: is_fraud (1 = confirmed fraud/chargeback, 0 = legitimate)
Class balance: 0.42% fraud, 99.58% non-fraud
Missing data: 18% missing in merchant risk features for new merchants, 6% missing in device attributes, sparse high-cardinality categoricals

Success Criteria

A strong solution should:

achieve recall >= 75% on fraud cases,
maintain precision >= 12% at the operating threshold,
deliver PR-AUC >= 0.30 on the held-out test set,
score each transaction in under 50 ms p95 for online inference.

Constraints

False positives directly impact checkout conversion and customer trust.
The fraud team can manually review at most 8,000 alerts/day.
The solution must be explainable enough to support analyst review and model governance.
Training can run offline daily; inference must support real-time API scoring.

Deliverables

Propose a modeling approach for severe class imbalance.
Build a training pipeline with preprocessing, feature handling, and threshold selection.
Justify the evaluation strategy and why accuracy is not appropriate.
Show how you would tune for recall/precision tradeoffs under review-capacity constraints.
Describe how the model would be deployed, monitored, and retrained in production.

Interview Guides

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Detect Card Fraud with Imbalanced Data

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer

Detect Card Fraud with Imbalanced Data

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Detect Card Fraud with Imbalanced Data

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer