Microsoft Marketplace wants to predict whether an active seller will become inactive in the next 60 days so the account team can intervene early. You are given historical seller-level snapshots and need to design a preprocessing and feature engineering pipeline suitable for a production classification model.
The training data is exported from Azure Data Lake and contains one row per seller per month.
| Feature Group | Count | Examples |
|---|---|---|
| Seller profile | 8 | seller_region, business_type, tenure_days, partner_tier |
| Commercial activity | 14 | gross_sales_30d, orders_30d, avg_order_value, refund_rate |
| Engagement | 10 | portal_logins_30d, campaign_clicks_30d, support_tickets_30d |
| Catalog quality | 7 | active_listings, % listings with images, policy_violations_90d |
| Temporal fields | 6 | snapshot_month, days_since_last_sale, sales_trend_3m |
inactive_60d = 1 if the seller has no sales activity in the following 60 daysA strong solution should produce a reproducible preprocessing pipeline, avoid temporal leakage, and improve over a simple one-hot + median-imputation baseline. Good enough means PR-AUC >= 0.42, ROC-AUC >= 0.80, and recall >= 0.70 at precision >= 0.35 on the holdout period.