
You are reviewing a retrieval model and need to explain how to judge whether it is returning the right items for a query. The team wants a clear way to measure result quality when only some retrieved items are actually relevant.
How would you evaluate precision and recall for a retrieval model?
How many retrieved items are relevantHow many relevant items were missedHow metrics change at different cutoffs such as top 10 and top 50Whether the retrieval set is meant for direct display or downstream ranking