You’re the on-call ML scientist for StreamShare, a short-form video platform with 45M DAUs and a heavy teen user base. StreamShare uses a binary classifier to detect and block high-severity policy violations at upload time (e.g., self-harm encouragement, explicit sexual content involving minors, and credible threats). The model is a transformer-based text+vision ensemble that outputs a probability score p(violation) and then applies a threshold to decide whether to auto-block, send to human review, or allow.
The company has a strict internal goal: keep “severe harm exposure” below a fixed budget because of regulatory scrutiny (EU DSA / UK OSA) and advertiser requirements. The policy team defines a “severe harm incident” as: a violating piece of content that is viewed by at least one user before it is removed or age-gated. The Safety Ops team can review up to 120,000 items/day globally.
Three weeks ago, you shipped a new model (Model B) that looked better offline than the previous production model (Model A). However, Trust & Safety reports that user-reported severe incidents increased week-over-week after the launch, even though the model’s offline metrics improved.
Holdout dataset: 10M labeled items from the last 60 days (labels from a mix of human review and post-hoc enforcement). Severe violations are rare.
| Metric (offline) | Model A (prod) | Model B (new) |
|---|---|---|
| AUC-ROC | 0.962 | 0.975 |
| Precision @ current threshold | 0.41 | 0.37 |
| Recall @ current threshold | 0.78 | 0.84 |
| F1 @ current threshold | 0.54 | 0.52 |
| Calibration (ECE) | 0.061 | 0.094 |
| % items sent to review | 1.10% | 1.35% |
Traffic is split 50/50 by uploader (A/B). Policy and enforcement rules are unchanged.
| Metric (online) | Model A | Model B |
|---|---|---|
| Severe harm incidents per 10K uploads | 1.8 | 2.4 |
| User reports per 10K uploads | 14.2 | 16.1 |
| Median time-to-action (minutes) | 18 | 26 |
| Review queue overflow rate | 2% | 11% |
| Creator appeal rate (blocked content) | 3.1% | 4.8% |
Despite better offline AUC and higher offline recall, Model B correlates with more severe harm exposure online, slower enforcement, and review capacity overflow. Leadership asks you two things:
Provide a structured answer that covers: