Deploy Notebook Fraud Model on AWS

Business Context

PayLink, a mid-market payments platform processing 8 million card transactions per day, has a fraud detection model currently developed in a Jupyter Notebook by the data science team. The model performs well offline, but the company now needs a scalable AWS deployment that supports low-latency online scoring for checkout traffic and reproducible retraining for weekly model updates.

Dataset

The current notebook uses a historical transaction dataset with engineered customer and merchant features.

Feature Group	Count	Examples
Transaction attributes	12	amount, currency, payment_method, device_type
Customer behavior	10	transactions_24h, avg_amount_30d, chargebacks_90d
Merchant attributes	6	merchant_category, merchant_risk_score, country
Derived time features	5	hour_of_day, day_of_week, is_holiday, account_age_days
Risk signals	7	ip_velocity, card_bin_risk, email_domain_risk

Size: 42 million transactions over 18 months, 40 tabular features
Target: Binary fraud label from confirmed chargebacks and manual review outcomes
Class balance: Highly imbalanced — 0.7% fraud, 99.3% non-fraud
Missing data: 8% missing in merchant risk fields, 3% missing in customer history for new users

Success Criteria

A strong solution should:

achieve PR-AUC >= 0.42 on the holdout test set,
maintain recall >= 75% at precision >= 20% for the fraud class,
support p95 online inference latency under 120 ms,
provide a clear path from notebook code to versioned, reproducible AWS deployment.

Constraints

Online scoring must scale to peak checkout traffic without manual intervention.
Feature transformations used in training and inference must be identical.
The fraud operations team needs model versioning, rollback capability, and basic explainability.
Budget should favor managed AWS services over a large custom platform.

Deliverables

Design a production ML workflow that converts the notebook into a trainable, testable Python package.
Build a classification pipeline with preprocessing, training, and threshold selection.
Describe how you would deploy the model on AWS for real-time inference and weekly retraining.
Define monitoring for prediction quality, latency, drift, and failed inferences.
Explain tradeoffs between batch and online features, model complexity, and operational cost.

Business Context

Dataset

The current notebook uses a historical transaction dataset with engineered customer and merchant features.

Feature Group	Count	Examples
Transaction attributes	12	amount, currency, payment_method, device_type
Customer behavior	10	transactions_24h, avg_amount_30d, chargebacks_90d
Merchant attributes	6	merchant_category, merchant_risk_score, country
Derived time features	5	hour_of_day, day_of_week, is_holiday, account_age_days
Risk signals	7	ip_velocity, card_bin_risk, email_domain_risk

Size: 42 million transactions over 18 months, 40 tabular features
Target: Binary fraud label from confirmed chargebacks and manual review outcomes
Class balance: Highly imbalanced — 0.7% fraud, 99.3% non-fraud
Missing data: 8% missing in merchant risk fields, 3% missing in customer history for new users

Success Criteria

A strong solution should:

achieve PR-AUC >= 0.42 on the holdout test set,
maintain recall >= 75% at precision >= 20% for the fraud class,
support p95 online inference latency under 120 ms,
provide a clear path from notebook code to versioned, reproducible AWS deployment.

Constraints

Online scoring must scale to peak checkout traffic without manual intervention.
Feature transformations used in training and inference must be identical.
The fraud operations team needs model versioning, rollback capability, and basic explainability.
Budget should favor managed AWS services over a large custom platform.

Deliverables

Design a production ML workflow that converts the notebook into a trainable, testable Python package.
Build a classification pipeline with preprocessing, training, and threshold selection.
Describe how you would deploy the model on AWS for real-time inference and weekly retraining.
Define monitoring for prediction quality, latency, drift, and failed inferences.
Explain tradeoffs between batch and online features, model complexity, and operational cost.

Business Context

Dataset

The current notebook uses a historical transaction dataset with engineered customer and merchant features.

Feature Group	Count	Examples
Transaction attributes	12	amount, currency, payment_method, device_type
Customer behavior	10	transactions_24h, avg_amount_30d, chargebacks_90d
Merchant attributes	6	merchant_category, merchant_risk_score, country
Derived time features	5	hour_of_day, day_of_week, is_holiday, account_age_days
Risk signals	7	ip_velocity, card_bin_risk, email_domain_risk

Size: 42 million transactions over 18 months, 40 tabular features
Target: Binary fraud label from confirmed chargebacks and manual review outcomes
Class balance: Highly imbalanced — 0.7% fraud, 99.3% non-fraud
Missing data: 8% missing in merchant risk fields, 3% missing in customer history for new users

Success Criteria

A strong solution should:

achieve PR-AUC >= 0.42 on the holdout test set,
maintain recall >= 75% at precision >= 20% for the fraud class,
support p95 online inference latency under 120 ms,
provide a clear path from notebook code to versioned, reproducible AWS deployment.

Constraints

Online scoring must scale to peak checkout traffic without manual intervention.
Feature transformations used in training and inference must be identical.
The fraud operations team needs model versioning, rollback capability, and basic explainability.
Budget should favor managed AWS services over a large custom platform.

Deliverables

Design a production ML workflow that converts the notebook into a trainable, testable Python package.
Build a classification pipeline with preprocessing, training, and threshold selection.
Describe how you would deploy the model on AWS for real-time inference and weekly retraining.
Define monitoring for prediction quality, latency, drift, and failed inferences.
Explain tradeoffs between batch and online features, model complexity, and operational cost.

Business Context

Dataset

The current notebook uses a historical transaction dataset with engineered customer and merchant features.

Feature Group	Count	Examples
Transaction attributes	12	amount, currency, payment_method, device_type
Customer behavior	10	transactions_24h, avg_amount_30d, chargebacks_90d
Merchant attributes	6	merchant_category, merchant_risk_score, country
Derived time features	5	hour_of_day, day_of_week, is_holiday, account_age_days
Risk signals	7	ip_velocity, card_bin_risk, email_domain_risk

Size: 42 million transactions over 18 months, 40 tabular features
Target: Binary fraud label from confirmed chargebacks and manual review outcomes
Class balance: Highly imbalanced — 0.7% fraud, 99.3% non-fraud
Missing data: 8% missing in merchant risk fields, 3% missing in customer history for new users

Success Criteria

A strong solution should:

achieve PR-AUC >= 0.42 on the holdout test set,
maintain recall >= 75% at precision >= 20% for the fraud class,
support p95 online inference latency under 120 ms,
provide a clear path from notebook code to versioned, reproducible AWS deployment.

Constraints

Online scoring must scale to peak checkout traffic without manual intervention.
Feature transformations used in training and inference must be identical.
The fraud operations team needs model versioning, rollback capability, and basic explainability.
Budget should favor managed AWS services over a large custom platform.

Deliverables

Design a production ML workflow that converts the notebook into a trainable, testable Python package.
Build a classification pipeline with preprocessing, training, and threshold selection.
Describe how you would deploy the model on AWS for real-time inference and weekly retraining.
Define monitoring for prediction quality, latency, drift, and failed inferences.
Explain tradeoffs between batch and online features, model complexity, and operational cost.

Interview Guides

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Deploy Notebook Fraud Model on AWS

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer

Deploy Notebook Fraud Model on AWS

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Deploy Notebook Fraud Model on AWS

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer