Business Context
Amazon Services wants to improve seller quality monitoring in Seller Central. You need to show when supervised learning is appropriate versus unsupervised learning by building both a seller-risk classifier and an unsupervised seller segmentation workflow on the same dataset.
Dataset
You are given a historical dataset of Amazon Marketplace sellers with monthly aggregates from the last 18 months.
| Feature Group | Count | Examples |
|---|
| Seller profile | 6 | marketplace, tenure_days, business_type, fulfillment_channel |
| Operational metrics | 10 | on_time_ship_rate, cancellation_rate, late_shipment_rate, return_rate |
| Customer experience | 8 | defect_rate, negative_feedback_rate, A-to-z_claim_rate, contact_rate |
| Commercial activity | 7 | orders_30d, GMV_30d, ASP, ad_spend_30d, buy_box_win_rate |
| Support / compliance | 5 | policy_warnings_90d, suspension_history, document_verification_age |
- Rows: 240K seller-month records, 36 features
- Target available for supervised task:
high_risk_60d = seller receives a policy enforcement action within 60 days
- Class balance: 7.4% positive, 92.6% negative
- Missing data: 12% missing in ad-related fields, 6% missing in customer-contact metrics, higher missingness for new sellers
Success Criteria
A good solution should:
- achieve PR-AUC >= 0.42 on the supervised task,
- produce actionable seller segments with clear behavioral differences for operations teams,
- explain when labels make supervised learning preferable and when unlabeled exploration justifies unsupervised learning.
Constraints
- Batch scoring in Amazon SageMaker must complete daily for ~150K active sellers.
- Risk outputs must be interpretable enough for operations review.
- Segmentation should be stable month over month and not depend on manual labeling.
Deliverables
- Train a supervised model to predict
high_risk_60d.
- Build an unsupervised clustering pipeline for seller segmentation.
- Compare the two approaches: objective, inputs, outputs, and evaluation.
- Recommend when Amazon should use each method in production.
- Provide feature importance and cluster profiles that operations teams can act on.