Business Context
PayLink, a mid-market payments platform processing 8 million card transactions per day, has a fraud detection model currently developed in a Jupyter Notebook by the data science team. The model performs well offline, but the company now needs a scalable AWS deployment that supports low-latency online scoring for checkout traffic and reproducible retraining for weekly model updates.
Dataset
The current notebook uses a historical transaction dataset with engineered customer and merchant features.
| Feature Group | Count | Examples |
|---|
| Transaction attributes | 12 | amount, currency, payment_method, device_type |
| Customer behavior | 10 | transactions_24h, avg_amount_30d, chargebacks_90d |
| Merchant attributes | 6 | merchant_category, merchant_risk_score, country |
| Derived time features | 5 | hour_of_day, day_of_week, is_holiday, account_age_days |
| Risk signals | 7 | ip_velocity, card_bin_risk, email_domain_risk |
- Size: 42 million transactions over 18 months, 40 tabular features
- Target: Binary fraud label from confirmed chargebacks and manual review outcomes
- Class balance: Highly imbalanced — 0.7% fraud, 99.3% non-fraud
- Missing data: 8% missing in merchant risk fields, 3% missing in customer history for new users
Success Criteria
A strong solution should:
- achieve PR-AUC >= 0.42 on the holdout test set,
- maintain recall >= 75% at precision >= 20% for the fraud class,
- support p95 online inference latency under 120 ms,
- provide a clear path from notebook code to versioned, reproducible AWS deployment.
Constraints
- Online scoring must scale to peak checkout traffic without manual intervention.
- Feature transformations used in training and inference must be identical.
- The fraud operations team needs model versioning, rollback capability, and basic explainability.
- Budget should favor managed AWS services over a large custom platform.
Deliverables
- Design a production ML workflow that converts the notebook into a trainable, testable Python package.
- Build a classification pipeline with preprocessing, training, and threshold selection.
- Describe how you would deploy the model on AWS for real-time inference and weekly retraining.
- Define monitoring for prediction quality, latency, drift, and failed inferences.
- Explain tradeoffs between batch and online features, model complexity, and operational cost.