Business Context
OpenAI needs to retrain a text safety classifier used in moderation pipelines for user prompts and model outputs. The goal is not just to reach good validation performance, but to choose an optimizer that converges reliably under production constraints and remains stable as data distributions shift.
Dataset
You are given a precomputed feature dataset derived from OpenAI moderation examples. Each row represents one text sample after embedding and metadata featurization.
| Feature Group | Count | Examples |
|---|
| Embedding features | 1536 | text_embedding_0 ... text_embedding_1535 |
| Text metadata | 8 | char_count, token_count, url_count, uppercase_ratio |
| Source context | 4 | surface, language, user_tier, model_family |
| Label | 1 | unsafe_content |
- Size: 420K examples, 1,549 features
- Target: Binary classification — unsafe content (1) vs allowed content (0)
- Class balance: 18% positive, 82% negative
- Missing data: ~6% missing in metadata fields; embeddings are complete
Success Criteria
A strong solution should:
- Achieve AUC-ROC >= 0.93 and PR-AUC >= 0.78 on the held-out test set
- Compare SGD, RMSprop, and Adam using the same model and data split
- Explain optimizer behavior in terms of convergence speed, sensitivity to learning rate, and generalization
- Produce a training setup that can be retrained weekly and scored in batch or low-latency online inference
Constraints
- Training budget is limited to 2 GPU-hours per full experiment sweep
- Inference must remain under 20 ms p95 per example in the online moderation path
- The solution should be explainable enough for ML engineers to debug optimizer instability and training regressions
Deliverables
- Implement a neural network classifier and train it with SGD, RMSprop, and Adam.
- Describe gradient descent and how each optimizer updates parameters.
- Compare train/validation curves, final metrics, and optimizer stability.
- Recommend one optimizer for production and justify the choice.
- Identify key hyperparameters to tune and likely failure modes during retraining.