Evaluate Search Result Satisfaction

Business Context

QueryFind, a consumer web search product, wants to predict whether the top search result satisfied the user's need so ranking issues can be detected quickly and low-quality results can be demoted. You need to build an NLP system that estimates satisfaction from the query, result snippet, landing-page text, and lightweight behavioral labels.

Data

Volume: 2.4M search sessions collected over 6 months
Unit of prediction: one query-result pair for the top-ranked result
Text fields: query (2-12 tokens), result title (5-20 tokens), snippet (20-180 tokens), landing-page extract (100-800 tokens)
Language: English only
Labels: Satisfied, Partially Satisfied, Not Satisfied
Label distribution: 58% satisfied, 24% partially satisfied, 18% not satisfied
Weak supervision source: reformulation rate, dwell time, pogo-sticking, and explicit thumbs-up/down feedback

Success Criteria

A good solution should achieve macro-F1 >= 0.78, recall >= 0.85 on Not Satisfied, and produce calibrated probabilities that can support ranking and monitoring decisions.

Constraints

Inference latency must stay below 80 ms per query-result pair at p95
The model must run in a Python service on a single T4 GPU or CPU fallback
Training data contains noisy labels derived from behavior, so robustness matters more than leaderboard accuracy

Requirements

Formulate the task as a supervised NLP problem and define the target label.
Design a preprocessing pipeline for query, snippet, and landing-page text.
Implement a baseline and a transformer-based model in Python.
Explain how you would handle weak labels, class imbalance, and short-vs-long text fields.
Define an evaluation plan with offline metrics, validation strategy, and error analysis.
Describe how you would decide whether the model is good enough for production use.

Business Context

Data

Volume: 2.4M search sessions collected over 6 months

Unit of prediction: one query-result pair for the top-ranked result

Text fields: query (2-12 tokens), result title (5-20 tokens), snippet (20-180 tokens), landing-page extract (100-800 tokens)

Language: English only

Labels: Satisfied, Partially Satisfied, Not Satisfied

Label distribution: 58% satisfied, 24% partially satisfied, 18% not satisfied

Weak supervision source: reformulation rate, dwell time, pogo-sticking, and explicit thumbs-up/down feedback

Requirements

Formulate the task as a supervised NLP problem and define the target label.

Design a preprocessing pipeline for query, snippet, and landing-page text.

Implement a baseline and a transformer-based model in Python.

Explain how you would handle weak labels, class imbalance, and short-vs-long text fields.

Define an evaluation plan with offline metrics, validation strategy, and error analysis.

Describe how you would decide whether the model is good enough for production use.

Business Context

Data

Volume: 2.4M search sessions collected over 6 months

Unit of prediction: one query-result pair for the top-ranked result

Text fields: query (2-12 tokens), result title (5-20 tokens), snippet (20-180 tokens), landing-page extract (100-800 tokens)

Language: English only

Labels: Satisfied, Partially Satisfied, Not Satisfied

Label distribution: 58% satisfied, 24% partially satisfied, 18% not satisfied

Weak supervision source: reformulation rate, dwell time, pogo-sticking, and explicit thumbs-up/down feedback

Requirements

Formulate the task as a supervised NLP problem and define the target label.

Design a preprocessing pipeline for query, snippet, and landing-page text.

Implement a baseline and a transformer-based model in Python.

Explain how you would handle weak labels, class imbalance, and short-vs-long text fields.

Define an evaluation plan with offline metrics, validation strategy, and error analysis.

Describe how you would decide whether the model is good enough for production use.

Business Context

Data

Volume: 2.4M search sessions collected over 6 months

Unit of prediction: one query-result pair for the top-ranked result

Text fields: query (2-12 tokens), result title (5-20 tokens), snippet (20-180 tokens), landing-page extract (100-800 tokens)

Language: English only

Labels: Satisfied, Partially Satisfied, Not Satisfied

Label distribution: 58% satisfied, 24% partially satisfied, 18% not satisfied

Weak supervision source: reformulation rate, dwell time, pogo-sticking, and explicit thumbs-up/down feedback

Requirements

Formulate the task as a supervised NLP problem and define the target label.

Design a preprocessing pipeline for query, snippet, and landing-page text.

Implement a baseline and a transformer-based model in Python.

Explain how you would handle weak labels, class imbalance, and short-vs-long text fields.

Define an evaluation plan with offline metrics, validation strategy, and error analysis.

Describe how you would decide whether the model is good enough for production use.

Interview Guides

Business Context

Data

Success Criteria

Constraints

Requirements

Evaluate Search Result Satisfaction

Business Context

Data

Success Criteria

Constraints

Requirements

Your Answer

Evaluate Search Result Satisfaction

Business Context

Data

Success Criteria

Constraints

Requirements

Evaluate Search Result Satisfaction

Business Context

Data

Success Criteria

Constraints

Requirements

Your Answer