Label Search Results by Satisfaction

Business Context

QueryFind, a consumer search platform, wants to annotate search results based on how well each result satisfies the user’s need for a given query. These annotations will be used to train downstream ranking models and improve search quality.

Data

You are given query-result pairs with human relevance judgments collected from editorial raters.

Volume: 850,000 labeled query-result pairs from the last 12 months
Text fields: query text, result title, snippet, URL path, and optional landing-page body text
Text length: queries are 2-12 tokens; snippets are 20-180 tokens; landing pages are truncated to 512 tokens
Language: English only
Label distribution: Fully Satisfies 11%, Highly Satisfies 19%, Moderately Satisfies 33%, Slightly Satisfies 22%, Fails to Satisfy 15%

Success Criteria

A strong solution should achieve macro-F1 >= 0.78, weighted F1 >= 0.84, and maintain good separation between adjacent relevance classes. Since these labels feed ranking systems, probability calibration and consistent ordinal behavior matter.

Constraints

Batch inference on 5M query-result pairs per day
Per-pair inference latency under 80ms on a T4 GPU
Model must be deployable in Python and exportable to ONNX
Training must fit on a single 16GB GPU

Requirements

Build an NLP model that predicts how well a search result satisfies a user need for a query.
Define a realistic preprocessing pipeline for query, title, snippet, and page text.
Implement training and evaluation in modern Python using transformer-based fine-tuning.
Explain how you would handle class imbalance and ordinal confusion between neighboring labels.
Describe how the model would be validated before use in a search ranking pipeline.

Business Context

Data

You are given query-result pairs with human relevance judgments collected from editorial raters.

Volume: 850,000 labeled query-result pairs from the last 12 months
Text fields: query text, result title, snippet, URL path, and optional landing-page body text
Text length: queries are 2-12 tokens; snippets are 20-180 tokens; landing pages are truncated to 512 tokens
Language: English only
Label distribution: Fully Satisfies 11%, Highly Satisfies 19%, Moderately Satisfies 33%, Slightly Satisfies 22%, Fails to Satisfy 15%

Success Criteria

Constraints

Batch inference on 5M query-result pairs per day
Per-pair inference latency under 80ms on a T4 GPU
Model must be deployable in Python and exportable to ONNX
Training must fit on a single 16GB GPU

Requirements

Build an NLP model that predicts how well a search result satisfies a user need for a query.
Define a realistic preprocessing pipeline for query, title, snippet, and page text.
Implement training and evaluation in modern Python using transformer-based fine-tuning.
Explain how you would handle class imbalance and ordinal confusion between neighboring labels.
Describe how the model would be validated before use in a search ranking pipeline.

Business Context

Data

You are given query-result pairs with human relevance judgments collected from editorial raters.

Volume: 850,000 labeled query-result pairs from the last 12 months
Text fields: query text, result title, snippet, URL path, and optional landing-page body text
Text length: queries are 2-12 tokens; snippets are 20-180 tokens; landing pages are truncated to 512 tokens
Language: English only
Label distribution: Fully Satisfies 11%, Highly Satisfies 19%, Moderately Satisfies 33%, Slightly Satisfies 22%, Fails to Satisfy 15%

Success Criteria

Constraints

Batch inference on 5M query-result pairs per day
Per-pair inference latency under 80ms on a T4 GPU
Model must be deployable in Python and exportable to ONNX
Training must fit on a single 16GB GPU

Requirements

Build an NLP model that predicts how well a search result satisfies a user need for a query.
Define a realistic preprocessing pipeline for query, title, snippet, and page text.
Implement training and evaluation in modern Python using transformer-based fine-tuning.
Explain how you would handle class imbalance and ordinal confusion between neighboring labels.
Describe how the model would be validated before use in a search ranking pipeline.

Business Context

Data

You are given query-result pairs with human relevance judgments collected from editorial raters.

Volume: 850,000 labeled query-result pairs from the last 12 months
Text fields: query text, result title, snippet, URL path, and optional landing-page body text
Text length: queries are 2-12 tokens; snippets are 20-180 tokens; landing pages are truncated to 512 tokens
Language: English only
Label distribution: Fully Satisfies 11%, Highly Satisfies 19%, Moderately Satisfies 33%, Slightly Satisfies 22%, Fails to Satisfy 15%

Success Criteria

Constraints

Batch inference on 5M query-result pairs per day
Per-pair inference latency under 80ms on a T4 GPU
Model must be deployable in Python and exportable to ONNX
Training must fit on a single 16GB GPU

Requirements

Build an NLP model that predicts how well a search result satisfies a user need for a query.
Define a realistic preprocessing pipeline for query, title, snippet, and page text.
Implement training and evaluation in modern Python using transformer-based fine-tuning.
Explain how you would handle class imbalance and ordinal confusion between neighboring labels.
Describe how the model would be validated before use in a search ranking pipeline.

Interview Guides

Business Context

Data

Success Criteria

Constraints

Requirements

Label Search Results by Satisfaction

Business Context

Data

Success Criteria

Constraints

Requirements

Your Answer

Label Search Results by Satisfaction

Business Context

Data

Success Criteria

Constraints

Requirements

Label Search Results by Satisfaction

Business Context

Data

Success Criteria

Constraints

Requirements

Your Answer