Compare Search Methods for Help Center

Business Context

BrightDesk, a SaaS customer support platform, wants to improve article retrieval in its help center. Users often search with natural-language questions, while the current system relies on keyword matching and misses semantically relevant documents.

Data

You are given a corpus of 180,000 English help-center articles and resolved support tickets, plus 25,000 historical user queries with clicked or manually judged relevant documents. Documents range from 20 to 1,200 words (median 180), and queries range from 2 to 40 words. Roughly 65% of queries are short keyword-style searches, while 35% are conversational or paraphrased questions. Relevance labels are sparse: each query has 1-5 positive documents and many unlabeled candidates.

Success Criteria

A strong solution should clearly compare embedding-based retrieval and traditional keyword search on relevance, robustness to paraphrasing, latency, and operational complexity. Target NDCG@10 ≥ 0.72 and Recall@20 ≥ 0.85 on held-out queries, while keeping p95 retrieval latency under 150 ms.

Constraints

Must run in a Python production stack
Index updates occur daily
Retrieval should support typo-tolerant and exact-match queries
Infrastructure budget allows one vector index and one lexical index, but not large cross-encoder reranking for every query

Requirements

Build a BM25 keyword baseline with realistic text preprocessing.
Build an embedding-based dense retriever using a modern sentence-transformer model.
Compare both methods on short exact-match queries, paraphrased queries, and rare product-term queries.
Propose a hybrid retrieval strategy and explain when it outperforms either method alone.
Implement evaluation with ranking metrics and brief error analysis.
Discuss trade-offs in indexing, latency, maintenance, and failure modes.

Business Context

Data

Success Criteria

Constraints

Must run in a Python production stack
Index updates occur daily
Retrieval should support typo-tolerant and exact-match queries
Infrastructure budget allows one vector index and one lexical index, but not large cross-encoder reranking for every query

Requirements

Build a BM25 keyword baseline with realistic text preprocessing.
Build an embedding-based dense retriever using a modern sentence-transformer model.
Compare both methods on short exact-match queries, paraphrased queries, and rare product-term queries.
Propose a hybrid retrieval strategy and explain when it outperforms either method alone.
Implement evaluation with ranking metrics and brief error analysis.
Discuss trade-offs in indexing, latency, maintenance, and failure modes.

Business Context

Data

Success Criteria

Constraints

Must run in a Python production stack
Index updates occur daily
Retrieval should support typo-tolerant and exact-match queries
Infrastructure budget allows one vector index and one lexical index, but not large cross-encoder reranking for every query

Requirements

Build a BM25 keyword baseline with realistic text preprocessing.
Build an embedding-based dense retriever using a modern sentence-transformer model.
Compare both methods on short exact-match queries, paraphrased queries, and rare product-term queries.
Propose a hybrid retrieval strategy and explain when it outperforms either method alone.
Implement evaluation with ranking metrics and brief error analysis.
Discuss trade-offs in indexing, latency, maintenance, and failure modes.

Business Context

Data

Success Criteria

Constraints

Must run in a Python production stack
Index updates occur daily
Retrieval should support typo-tolerant and exact-match queries
Infrastructure budget allows one vector index and one lexical index, but not large cross-encoder reranking for every query

Requirements

Build a BM25 keyword baseline with realistic text preprocessing.
Build an embedding-based dense retriever using a modern sentence-transformer model.
Compare both methods on short exact-match queries, paraphrased queries, and rare product-term queries.
Propose a hybrid retrieval strategy and explain when it outperforms either method alone.
Implement evaluation with ranking metrics and brief error analysis.
Discuss trade-offs in indexing, latency, maintenance, and failure modes.

Interview Guides

Business Context

Data

Success Criteria

Constraints

Requirements

Compare Search Methods for Help Center

Business Context

Data

Success Criteria

Constraints

Requirements

Your Answer

Compare Search Methods for Help Center

Business Context

Data

Success Criteria

Constraints

Requirements

Compare Search Methods for Help Center

Business Context

Data

Success Criteria

Constraints

Requirements

Your Answer