You own a binary image moderation model used to screen user-uploaded creative assets before they are published in a digital design platform. The current system is a ResNet-based classifier that auto-blocks uploads above a 0.70 score, auto-approves low-risk uploads, and sends the middle band to human review. Over the last six weeks, Trust & Safety escalated that more policy-violating assets are reaching production even though dashboard accuracy still looks stable. You are asked to evaluate whether the model is degrading, whether the threshold is still appropriate, and what should change.
| Metric | Validation at Launch | Current Production |
|---|---|---|
| Accuracy | 96.8% | 96.1% |
| Precision | 81.4% | 79.8% |
| Recall | 88.7% | 63.2% |
| F1 Score | 84.9% | 70.3% |
| AUC-ROC | 0.942 | 0.903 |
| Policy-violating prevalence | 4.1% | 7.8% |
| Auto-block rate | 6.9% | 6.2% |
| Unsafe assets approved per 100k uploads | 463 | 2,870 |
How would you interpret these results, diagnose the likely failure mode, and recommend the most important evaluation and model changes before the next release?