Business Context
Meta's Integrity team wants a model to predict whether a newly created Facebook account will be actioned for coordinated spam or fake engagement within 7 days. You need to compare bagging and boosting in a realistic classification setting and recommend which ensemble approach should be deployed.
Dataset
You are given a training table built from account creation, graph, and early activity signals.
| Feature Group | Count | Examples |
|---|
| Account metadata | 8 | account_age_hours, signup_surface, country, device_os |
| Activity features | 14 | posts_first_24h, friend_requests_sent, groups_joined, outbound_message_count |
| Graph features | 9 | accepted_request_rate, clustering_coefficient, mutual_friends_p50 |
| Integrity heuristics | 6 | prior_device_risk, IP_reputation_score, velocity_bucket |
| Temporal features | 5 | hour_of_day_created, weekend_signup, session_gap_minutes |
- Size: 420K accounts, 42 features
- Target:
enforced_7d — 1 if the account is actioned within 7 days, else 0
- Class balance: 6.4% positive, 93.6% negative
- Missing data: ~12% missing in graph features for cold-start accounts; ~4% missing in device-level fields
Success Criteria
A good solution should:
- Beat a single decision tree baseline by a meaningful margin
- Show a clear comparison between a bagging method and a boosting method
- Achieve PR-AUC >= 0.42 and recall >= 0.75 at precision >= 0.30 on the test set
- Explain which method is preferable for Meta's production constraints
Constraints
- Batch scoring runs every 15 minutes; p95 inference latency should stay under 50 ms per 1K accounts
- The model must support feature importance analysis for Integrity analysts
- Retraining happens weekly; the pipeline should tolerate moderate feature drift
Deliverables
- Train a single decision tree baseline, a bagging model, and a boosting model.
- Explain the conceptual difference between bagging and boosting in the context of this dataset.
- Compare performance using PR-AUC, ROC-AUC, recall at fixed precision, and calibration.
- Recommend one approach for deployment and justify the tradeoffs.
- Describe how you would monitor degradation after launch.