Business Context
GitLab wants to improve issue intake by automatically routing new issues and support-style requests to the right product area and urgency bucket inside GitLab. You need to design and fine-tune a large language model for this feature so it can power assisted triage in GitLab Issues and GitLab Duo workflows.
Data
You have 2.4 million historical GitLab issues, support tickets, and internal triage comments collected over 30 months.
- Task: predict both product area (12 classes such as CI/CD, Source Code, Merge Requests, Security, Runners, Observability) and priority (P1-P4)
- Volume: 2.4M records, with 1.8M usable labeled examples after filtering noisy labels
- Text length: 20-2,500 tokens; median 220 tokens
- Language: English 88%, Japanese 5%, German 4%, other 3%
- Label distribution: highly imbalanced; some product areas are <3% of examples, and P1 is <1%
- Inputs: title, description, stack trace/log snippets, labels, and optional comments
Success Criteria
A production-ready solution should achieve:
- Macro-F1 >= 0.82 on product area classification
- Recall >= 0.92 for P1 issues
- Inference latency < 300 ms per issue at batch size 8
- Clear improvement over a TF-IDF + linear baseline and a prompt-only zero-shot baseline
Constraints
- Training must run in GitLab CI/CD on limited GPU capacity
- Sensitive customer text cannot leave GitLab-controlled infrastructure
- The model must support weekly refreshes and rollback-safe deployment
Requirements
- Propose a fine-tuning approach for this multi-task NLP problem.
- Define a realistic preprocessing pipeline for issue text, logs, and multilingual content.
- Implement training and evaluation in modern Python using Hugging Face Transformers.
- Explain how you would handle class imbalance, noisy labels, and long inputs.
- Describe how you would validate the model offline before enabling it in GitLab production.