Fine-Tune GitLab Issue Triage LLM

Business Context

GitLab wants to improve issue intake by automatically routing new issues and support-style requests to the right product area and urgency bucket inside GitLab. You need to design and fine-tune a large language model for this feature so it can power assisted triage in GitLab Issues and GitLab Duo workflows.

Data

You have 2.4 million historical GitLab issues, support tickets, and internal triage comments collected over 30 months.

Task: predict both product area (12 classes such as CI/CD, Source Code, Merge Requests, Security, Runners, Observability) and priority (P1-P4)
Volume: 2.4M records, with 1.8M usable labeled examples after filtering noisy labels
Text length: 20-2,500 tokens; median 220 tokens
Language: English 88%, Japanese 5%, German 4%, other 3%
Label distribution: highly imbalanced; some product areas are <3% of examples, and P1 is <1%
Inputs: title, description, stack trace/log snippets, labels, and optional comments

Success Criteria

A production-ready solution should achieve:

Macro-F1 >= 0.82 on product area classification
Recall >= 0.92 for P1 issues
Inference latency < 300 ms per issue at batch size 8
Clear improvement over a TF-IDF + linear baseline and a prompt-only zero-shot baseline

Constraints

Training must run in GitLab CI/CD on limited GPU capacity
Sensitive customer text cannot leave GitLab-controlled infrastructure
The model must support weekly refreshes and rollback-safe deployment

Requirements

Propose a fine-tuning approach for this multi-task NLP problem.
Define a realistic preprocessing pipeline for issue text, logs, and multilingual content.
Implement training and evaluation in modern Python using Hugging Face Transformers.
Explain how you would handle class imbalance, noisy labels, and long inputs.
Describe how you would validate the model offline before enabling it in GitLab production.

Business Context

Data

You have 2.4 million historical GitLab issues, support tickets, and internal triage comments collected over 30 months.

Task: predict both product area (12 classes such as CI/CD, Source Code, Merge Requests, Security, Runners, Observability) and priority (P1-P4)
Volume: 2.4M records, with 1.8M usable labeled examples after filtering noisy labels
Text length: 20-2,500 tokens; median 220 tokens
Language: English 88%, Japanese 5%, German 4%, other 3%
Label distribution: highly imbalanced; some product areas are <3% of examples, and P1 is <1%
Inputs: title, description, stack trace/log snippets, labels, and optional comments

Success Criteria

A production-ready solution should achieve:

Macro-F1 >= 0.82 on product area classification
Recall >= 0.92 for P1 issues
Inference latency < 300 ms per issue at batch size 8
Clear improvement over a TF-IDF + linear baseline and a prompt-only zero-shot baseline

Constraints

Training must run in GitLab CI/CD on limited GPU capacity
Sensitive customer text cannot leave GitLab-controlled infrastructure
The model must support weekly refreshes and rollback-safe deployment

Requirements

Propose a fine-tuning approach for this multi-task NLP problem.
Define a realistic preprocessing pipeline for issue text, logs, and multilingual content.
Implement training and evaluation in modern Python using Hugging Face Transformers.
Explain how you would handle class imbalance, noisy labels, and long inputs.
Describe how you would validate the model offline before enabling it in GitLab production.

Business Context

Data

You have 2.4 million historical GitLab issues, support tickets, and internal triage comments collected over 30 months.

Task: predict both product area (12 classes such as CI/CD, Source Code, Merge Requests, Security, Runners, Observability) and priority (P1-P4)
Volume: 2.4M records, with 1.8M usable labeled examples after filtering noisy labels
Text length: 20-2,500 tokens; median 220 tokens
Language: English 88%, Japanese 5%, German 4%, other 3%
Label distribution: highly imbalanced; some product areas are <3% of examples, and P1 is <1%
Inputs: title, description, stack trace/log snippets, labels, and optional comments

Success Criteria

A production-ready solution should achieve:

Macro-F1 >= 0.82 on product area classification
Recall >= 0.92 for P1 issues
Inference latency < 300 ms per issue at batch size 8
Clear improvement over a TF-IDF + linear baseline and a prompt-only zero-shot baseline

Constraints

Training must run in GitLab CI/CD on limited GPU capacity
Sensitive customer text cannot leave GitLab-controlled infrastructure
The model must support weekly refreshes and rollback-safe deployment

Requirements

Propose a fine-tuning approach for this multi-task NLP problem.
Define a realistic preprocessing pipeline for issue text, logs, and multilingual content.
Implement training and evaluation in modern Python using Hugging Face Transformers.
Explain how you would handle class imbalance, noisy labels, and long inputs.
Describe how you would validate the model offline before enabling it in GitLab production.

Business Context

Data

You have 2.4 million historical GitLab issues, support tickets, and internal triage comments collected over 30 months.

Task: predict both product area (12 classes such as CI/CD, Source Code, Merge Requests, Security, Runners, Observability) and priority (P1-P4)
Volume: 2.4M records, with 1.8M usable labeled examples after filtering noisy labels
Text length: 20-2,500 tokens; median 220 tokens
Language: English 88%, Japanese 5%, German 4%, other 3%
Label distribution: highly imbalanced; some product areas are <3% of examples, and P1 is <1%
Inputs: title, description, stack trace/log snippets, labels, and optional comments

Success Criteria

A production-ready solution should achieve:

Macro-F1 >= 0.82 on product area classification
Recall >= 0.92 for P1 issues
Inference latency < 300 ms per issue at batch size 8
Clear improvement over a TF-IDF + linear baseline and a prompt-only zero-shot baseline

Constraints

Training must run in GitLab CI/CD on limited GPU capacity
Sensitive customer text cannot leave GitLab-controlled infrastructure
The model must support weekly refreshes and rollback-safe deployment

Requirements

Propose a fine-tuning approach for this multi-task NLP problem.
Define a realistic preprocessing pipeline for issue text, logs, and multilingual content.
Implement training and evaluation in modern Python using Hugging Face Transformers.
Explain how you would handle class imbalance, noisy labels, and long inputs.
Describe how you would validate the model offline before enabling it in GitLab production.

Interview Guides

Business Context

Data

Success Criteria

Constraints

Requirements

Fine-Tune GitLab Issue Triage LLM

Business Context

Data

Success Criteria

Constraints

Requirements

Your Answer

Fine-Tune GitLab Issue Triage LLM

Business Context

Data

Success Criteria

Constraints

Requirements

Fine-Tune GitLab Issue Triage LLM

Business Context

Data

Success Criteria

Constraints

Requirements

Your Answer