Detect Card Fraud with Imbalanced Data

Business Context

PayFlow processes roughly 8 million card transactions per day. Fraud is rare but costly, and the risk team needs a model that flags suspicious transactions without overwhelming manual reviewers with false positives.

Dataset

You are given a historical transaction dataset for binary classification.

Feature Group	Count	Examples
Transaction attributes	12	amount, currency, merchant_category, card_present, payment_channel
Customer behavior	10	avg_txn_7d, txn_count_24h, chargebacks_90d, account_age_days
Merchant signals	6	merchant_risk_score, country, device_consistency, velocity_score
Device / location	8	device_id_hash, ip_country, distance_from_home, browser_type
Temporal features	4	hour_of_day, day_of_week, is_holiday, seconds_since_last_txn

Size: 1.2M transactions, 40 features
Target: is_fraud (1 = fraudulent, 0 = legitimate)
Class balance: 0.7% fraud, 99.3% non-fraud
Missing data: ~12% missing in device fields, ~4% missing in merchant metadata, sparse missingness elsewhere

Success Criteria

A good solution should improve fraud capture materially over a majority-class baseline. Target at least 75% recall on fraud while keeping precision above 15% for the review queue, and achieve PR AUC above 0.30 on the holdout set.

Constraints

Batch scoring every 15 minutes; average inference latency should stay under 50 ms per transaction.
The fraud operations team needs feature-level explanations for flagged transactions.
False positives are expensive because analysts can review only ~5,000 transactions per day.

Deliverables

Build a classification pipeline that handles the severe class imbalance correctly.
Explain how you would compare class weighting, resampling, and threshold tuning.
Choose evaluation metrics appropriate for rare-event detection and justify them.
Produce a validation strategy that avoids leakage from customer or time-based patterns.
Recommend a production-ready model, decision threshold, and monitoring plan.

Business Context

Dataset

You are given a historical transaction dataset for binary classification.

Feature Group	Count	Examples
Transaction attributes	12	amount, currency, merchant_category, card_present, payment_channel
Customer behavior	10	avg_txn_7d, txn_count_24h, chargebacks_90d, account_age_days
Merchant signals	6	merchant_risk_score, country, device_consistency, velocity_score
Device / location	8	device_id_hash, ip_country, distance_from_home, browser_type
Temporal features	4	hour_of_day, day_of_week, is_holiday, seconds_since_last_txn

Size: 1.2M transactions, 40 features
Target: is_fraud (1 = fraudulent, 0 = legitimate)
Class balance: 0.7% fraud, 99.3% non-fraud
Missing data: ~12% missing in device fields, ~4% missing in merchant metadata, sparse missingness elsewhere

Success Criteria

Constraints

Batch scoring every 15 minutes; average inference latency should stay under 50 ms per transaction.
The fraud operations team needs feature-level explanations for flagged transactions.
False positives are expensive because analysts can review only ~5,000 transactions per day.

Deliverables

Build a classification pipeline that handles the severe class imbalance correctly.
Explain how you would compare class weighting, resampling, and threshold tuning.
Choose evaluation metrics appropriate for rare-event detection and justify them.
Produce a validation strategy that avoids leakage from customer or time-based patterns.
Recommend a production-ready model, decision threshold, and monitoring plan.

Business Context

Dataset

You are given a historical transaction dataset for binary classification.

Feature Group	Count	Examples
Transaction attributes	12	amount, currency, merchant_category, card_present, payment_channel
Customer behavior	10	avg_txn_7d, txn_count_24h, chargebacks_90d, account_age_days
Merchant signals	6	merchant_risk_score, country, device_consistency, velocity_score
Device / location	8	device_id_hash, ip_country, distance_from_home, browser_type
Temporal features	4	hour_of_day, day_of_week, is_holiday, seconds_since_last_txn

Size: 1.2M transactions, 40 features
Target: is_fraud (1 = fraudulent, 0 = legitimate)
Class balance: 0.7% fraud, 99.3% non-fraud
Missing data: ~12% missing in device fields, ~4% missing in merchant metadata, sparse missingness elsewhere

Success Criteria

Constraints

Batch scoring every 15 minutes; average inference latency should stay under 50 ms per transaction.
The fraud operations team needs feature-level explanations for flagged transactions.
False positives are expensive because analysts can review only ~5,000 transactions per day.

Deliverables

Build a classification pipeline that handles the severe class imbalance correctly.
Explain how you would compare class weighting, resampling, and threshold tuning.
Choose evaluation metrics appropriate for rare-event detection and justify them.
Produce a validation strategy that avoids leakage from customer or time-based patterns.
Recommend a production-ready model, decision threshold, and monitoring plan.

Business Context

Dataset

You are given a historical transaction dataset for binary classification.

Feature Group	Count	Examples
Transaction attributes	12	amount, currency, merchant_category, card_present, payment_channel
Customer behavior	10	avg_txn_7d, txn_count_24h, chargebacks_90d, account_age_days
Merchant signals	6	merchant_risk_score, country, device_consistency, velocity_score
Device / location	8	device_id_hash, ip_country, distance_from_home, browser_type
Temporal features	4	hour_of_day, day_of_week, is_holiday, seconds_since_last_txn

Size: 1.2M transactions, 40 features
Target: is_fraud (1 = fraudulent, 0 = legitimate)
Class balance: 0.7% fraud, 99.3% non-fraud
Missing data: ~12% missing in device fields, ~4% missing in merchant metadata, sparse missingness elsewhere

Success Criteria

Constraints

Batch scoring every 15 minutes; average inference latency should stay under 50 ms per transaction.
The fraud operations team needs feature-level explanations for flagged transactions.
False positives are expensive because analysts can review only ~5,000 transactions per day.

Deliverables

Build a classification pipeline that handles the severe class imbalance correctly.
Explain how you would compare class weighting, resampling, and threshold tuning.
Choose evaluation metrics appropriate for rare-event detection and justify them.
Produce a validation strategy that avoids leakage from customer or time-based patterns.
Recommend a production-ready model, decision threshold, and monitoring plan.

Interview Guides

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Detect Card Fraud with Imbalanced Data

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer

Detect Card Fraud with Imbalanced Data

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Detect Card Fraud with Imbalanced Data

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer