ABSTRACT
In the TREC Web Diversity track, novelty-biased cumulative gain (α-NDCG) is one of the official measures to assess retrieval performance of IR systems. The measure is characterised by a parameter, α, the effect of which has not been thoroughly investigated. We find that common settings of α, i.e. α=0.5, may prevent the measure from behaving as desired when evaluating result diversification. This is because it excessively penalises systems that cover many intents while it rewards those that redundantly cover only few intents. This issue is crucial since it highly influences systems at top ranks. We revisit our previously proposed threshold, suggesting α be set on a query-basis. The intuitiveness of the measure is then studied by examining actual rankings from TREC 09-10 Web track submissions. By varying α according to our query-based threshold, the discriminative power of α-NDCG is not harmed and in fact, our approach improves α-NDCG's robustness. Experimental results show that the threshold for α can turn the measure to be more intuitive than using its common settings.
- B. Carterette. System effectiveness, user models, and user utility: a conceptual framework for investigation. In SIGIR '11, pages 903--912, 2011. Google ScholarDigital Library
- O. Chapelle, D. Metlzer, Y. Zhang, and P. Grinspan. Expected reciprocal rank for graded relevance. In CIKM '09, pages 621--630, 2009. Google ScholarDigital Library
- C. L. Clarke, N. Craswell, and I. Soboroff. Overview of the trec 2009 web track. In TREC 18, 2009.Google Scholar
- C. L. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In SIGIR '08, pages 659--666, 2008. Google ScholarDigital Library
- C. L. Clarke, M. Kolla, and O. Vechtomova. An effectiveness measure for ambiguous and underspecified queries. In ICTIR '09, pages 188--199, 2009. Google ScholarDigital Library
- T. Leelanupab, G. Zuccon, and J. M. Jose. A query-basis approach to parametrizing novelty-biased cumulative gain. In ICTIR '11, pages 327--331, 2011. Google ScholarDigital Library
- T. Leelanupab. A Ranking Framework and Evaluation for Diversity-Based Retrieval. PhD thesis, School of Computing Science, University of Glasgow, 2012.Google Scholar
- F. Radlinski, P. N. Bennett, B. Carterette, and T. Joachims. Redundancy, diversity and interdependent document relevance. ACM SIGIR Forum, 43:46--52, 2009. Google ScholarDigital Library
- S. E. Robertson, M. Maron, and W. Cooper. Probability of Relevance: A Unification of two Competing Models for Document Retrieval. Information technology: research and development, pages 1--21, 1982.Google Scholar
- T. Sakai. Evaluating evaluation metrics based on the bootstrap. In SIGIR '06, pages 525--532, 2006. Google ScholarDigital Library
- T. Sakai, N. Craswell, R. Song, S. Robertson, Z. Dou, and C. Y. Lin. Simple evaluation metrics for diversified search results. In EVIA '10, pages 42--50, 2010.Google Scholar
- T. Sakai and R. Song. Evaluating diversified search results using per-intent graded relevance. In SIGIR '11, pages 1043--1052, Beijing, China, 2011. Google ScholarDigital Library
- E. Yilmaz, J. Aslam, and S. Robertson. A new rank correlation coefficient for information retrieval. In SIGIR '08, pages 587--594, 2008. Google ScholarDigital Library
- C. X. Zhai, W. W. Cohen, and J. Lafferty. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In SIGIR '03, pages 10--17, 2003. Google ScholarDigital Library
- G. Zuccon. Ranking Documents with Quantum Probabilities. PhD thesis, School of Computing Science, University of Glasgow, 2012.Google Scholar
Index Terms
- A comprehensive analysis of parameter settings for novelty-biased cumulative gain
Recommendations
A query-basis approach to parametrizing novelty-biased cumulative gain
ICTIR'11: Proceedings of the Third international conference on Advances in information retrieval theoryNovelty-biased cumulative gain (α-NDCG) has become the de facto measure within the information retrieval (IR) community for evaluating retrieval systems in the context of sub-topic retrieval. Setting the incorrect value of parameter α in α-NDCG prevents ...
Low-cost, bottom-up measures for evaluating search result diversification
AbstractSearch result diversification aims at covering different user intents by returning a diversified document list. Most existing diversity measures require a predefined set of intents for a given query, where it is assumed that there is no ...
On stability of signature-based similarity measures for content-based image retrieval
Retrieving similar images from large image databases is a challenging task for today's content-based retrieval systems. Aiming at high retrieval performance, these systems frequently capture the user's notion of similarity through expressive image ...
Comments