Business Context
PromptForge, an AI tooling company, is building an internal knowledge assistant for its ML platform team. The team receives hundreds of questions each week about deployment, monitoring, prompt management, and governance for large language model systems, and wants an NLP pipeline that automatically categorizes these questions into LLMOps topics.
Data
You are given a corpus of 180,000 historical support tickets and Slack questions related to LLM platforms.
- Task: Classify each question into one of 5 LLMOps categories:
deployment, monitoring, evaluation, governance, cost-optimization
- Text length: 10-220 words, median 42 words
- Language: English only
- Label distribution: moderately imbalanced;
deployment and monitoring together make up ~55% of examples
- Noise: duplicate tickets, code snippets, URLs, log fragments, and inconsistent product names
The goal is not to define LLMOps abstractly, but to build a practical NLP system that can route LLMOps-related questions correctly.
Success Criteria
A production-ready solution should achieve:
- Macro-F1 >= 0.84 on a held-out test set
- Recall >= 0.90 for
monitoring and governance
- Inference latency < 120ms per query for online routing
Constraints
- Must run on a single T4 GPU for training and CPU for inference
- Model artifact should remain under 500MB
- The pipeline should be easy to retrain weekly as new labeled questions arrive
Requirements
- Build a multi-class text classification pipeline for LLMOps questions.
- Design preprocessing for noisy technical text, including logs and code fragments.
- Fine-tune a modern transformer model in Python.
- Evaluate the model with class-aware metrics and confusion analysis.
- Briefly explain why this system is useful for AI development workflows and operational support.