OpenAI’s billing risk team trains a fraud classifier daily to score card payment attempts and account-level abuse signals. The current training job takes 6+ hours, which delays model refreshes and slows incident response when fraud patterns shift.
You are given a tabular binary classification dataset built from 90 days of payment attempts and account activity.
| Feature Group | Count | Examples |
|---|---|---|
| Transaction features | 18 | amount_usd, card_bin_risk_score, issuer_country, retry_count |
| Account features | 12 | account_age_days, prior_chargebacks, org_size, payment_method_count |
| Behavioral aggregates | 14 | failed_payments_1d, spend_7d, distinct_cards_30d, login_velocity |
| Temporal features | 6 | hour_of_day, day_of_week, days_since_last_payment |
| Derived ratios | 5 | failed_to_success_ratio, amount_vs_account_median |
is_fraud — chargeback or confirmed payment abuse within 14 daysA solution is good enough if it reduces end-to-end training time by at least 60% while keeping PR-AUC within 2% relative of the current production model and preserving recall at 5% alert rate.