Business Context
NovaDesk, a SaaS support platform, is building a GenAI assistant that answers customer questions from product documentation, release notes, and historical support articles. The team wants you to justify which embeddings to use and implement an evaluation pipeline for retrieval quality.
Data
- Corpus size: 180,000 documents split into ~1.4M chunks
- Text length: 30-1,200 words per document; chunk size target 200-400 tokens
- Language: English only
- Query volume: ~25,000 user questions per day
- Labels available: 12,000 query-document relevance pairs from support logs
- Class balance: Highly sparse relevance; most retrieved chunks are non-relevant
Success Criteria
A good solution should improve semantic retrieval over a TF-IDF baseline, achieve strong ranking quality on held-out relevance pairs, and support production inference with p95 embedding latency under 80 ms per query.
Constraints
- Must run in a VPC; no external API calls at inference time
- Index must fit on a single CPU search node plus one GPU box for offline embedding jobs
- Weekly re-indexing is acceptable; full retraining is not
Requirements
- Build a retrieval pipeline for a GenAI assistant using dense embeddings.
- Compare at least two embedding approaches, including one sentence-transformer model.
- Define a realistic chunking and preprocessing strategy for technical documentation.
- Implement indexing, retrieval, and offline evaluation in modern Python.
- Explain when you would choose domain-specific embeddings, general-purpose embeddings, or TF-IDF hybrids.
- Report retrieval metrics and recommend one production-ready embedding approach.