Business Context
Microsoft Store processes millions of payment attempts each month across cards, wallets, gift balances, and digital subscriptions. Fraud losses are material, but excessive false positives also block legitimate customers and create support costs, so the fraud model must perform well on a highly imbalanced dataset.
Dataset
You are given a historical transaction dataset used for post-authorization fraud labeling.
| Feature Group | Count | Examples |
|---|
| Transaction attributes | 14 | amount, currency, payment_method, merchant_category, is_digital_good |
| Customer behavior | 11 | account_age_days, prior_chargebacks_90d, failed_logins_7d, avg_order_value_30d |
| Device and network | 9 | device_id_hash, browser_family, IP_country, ASN_risk_score |
| Velocity features | 8 | txns_last_10m, cards_per_device_24h, amount_sum_1h, distinct_accounts_per_ip_24h |
| Risk signals | 6 | AVS_result, CVV_result, 3DS_used, email_domain_risk, geodistance_km |
- Size: 4.8M transactions over 9 months, 48 modeled features
- Target: Binary fraud label confirmed within 45 days of transaction settlement
- Class balance: 0.42% fraud, 99.58% non-fraud
- Missing data: 18% missing in AVS/CVV-related fields, 7% missing in device attributes, and sparse values for new users
Success Criteria
A strong solution should:
- achieve recall >= 75% on fraudulent transactions,
- maintain precision >= 18% at the operating threshold,
- deliver PR-AUC >= 0.30, and
- produce a ranked fraud score usable in Microsoft Azure batch scoring and near-real-time review queues.
Constraints
- Inference latency must stay under 50 ms p95 per transaction in Azure Machine Learning online endpoints.
- The fraud operations team needs feature-level explanations for manual review.
- Labels arrive with delay, so validation must avoid temporal leakage.
- The review queue can only inspect the top 1.5% of scored transactions.
Deliverables
- Propose a modeling strategy for severe class imbalance in fraud detection.
- Explain how you would split data, engineer features, and avoid leakage.
- Train and evaluate a production-ready classifier with threshold tuning.
- Show how you would measure business impact using ranking and classification metrics.
- Describe deployment and monitoring considerations in Azure Machine Learning.