Abstract
When a Document Retrieval system receives a query, a Relevance model is used to provide a score to each document based on its relevance to the query. Relevance models have parameters that should be tuned to optimise the accuracy of the relevance model for the document set and expected queries, where the accuracy is computed using an Information Retrieval evaluation function. Unfortunately, evaluation functions contain a discontinuous mapping from the document scores to document ranks, making optimisation of relevance models difficult using gradient based optimisation methods. In this article, we identify that the evaluation function Rank-biased Precision (RBP) performs a conversion from document scores, to ranks, then to weights. Therefore, we investigate the utility of bypassing the conversion to ranks (converting document score directly to RBP weights) for Relevance model tuning purposes. We find that using transformed BM25 document scores in the place of the RBP weights provides an equivalent optimisation function for mean and median RBP. Therefore, we can use this document score based RBP as a surrogate for tuning relevance models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Amati, G., Rijsbergen, C.J.V.: Probabilistic models of information retrieval based on measuring the divergence from randomness. In: ACM Transactions on Information Systems (TOIS), pp. 357–389. ACM (2002)
Amati, G., van Rijsbergen, C.J.: Term frequency normalization via pareto distributions. In: Crestani, F., Girolami, M., Van Rijsbergen, C.J. (eds.) ECIR 2002. LNCS, vol. 2291, pp. 183–192. Springer, Heidelberg (2002). doi:10.1007/3-540-45886-7_13
Amati, G.: Probability models for information retrieval based on divergence from randomness. Ph.D. thesis, Department of Computing Science, University of Glasgow, Scotland, December 2003
Büttcher, S., Clarke, C., Cormack, G.: Information Retrieval: Implementing and Evaluating Search Engines. The MIT Press, Cambridge (2010)
Chris, B., Tal, S., Erin, R., Ari, L., Matt, D., Nicole, H., Greg, H.: Learning to rank using gradient descent. In: Proceedings of the 22nd International Conference on Machine Learning, Bonn, Germany, pp. 89–96. ACM (2005)
He, B., Ounis, I.: A study of parameter tuning for term frequency normalization. In: CIKM 2003 Proceedings of the Twelfth International Conference on Information and Knowledge Management, New Orleans, Louisiana, USA, pp. 10–16. ACM (2003)
He, B., Ounis, I.: Term frequency normalisation tuning for BM25 and DFR models. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 200–214. Springer, Heidelberg (2005). doi:10.1007/978-3-540-31865-1_15
Järvelin, K., Kekäläinen, J.: IR evaluation methods for retrieving highly relevant documents. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece, pp. 41–48. ACM (2000)
Lafferty, J., Zhai, C.: Document language models, query models, and risk minimization for information retrieval. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, Louisiana, USA, pp. 111–119. ACM (2001)
Lafferty, J., Zhai, C.: A study of smoothing methods for language models applied to information retrieval. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, Louisiana, USA, pp. 334–342. ACM (2001)
Manning, D.C., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, 2nd edn. Cambridge University Press, New York (2009)
Matawie, K., Hasso, S.: Information retrieval models: Performance, evaluation and comparisons for healthcare big data analytics. In: Proceedings of the 31st International Workshop on Statistical Modelling, Rennes, France, pp. 207–212 (2016)
Moffat, A., Zobel, J.: Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Inf. Syst. (TOIS) 27(1), 2:1–2:27 (2008)
Park, L.A.F., Zhang, Y.: On the distribution of user persistence for rank-biased precision. In: The Proceedings of the Twelfth Australasian Document Computing Symposium (2007)
Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, pp. 21–29. ACM (1996)
Taylor, M., Zaragoza, H., Craswell, N., Burges, C.: Optimisation methods for ranking functions with multiple parameters. In: Proceedings of the 15th ACM international conference on Information and Knowledge Management, Arlington, Virginia, USA, pp. 585–593. ACM (2006)
Valizadegan, H., Jin, R., Zhang, R., Mao, J.: Learning to rank by optimizing NDCG measure. In: Advances in Neural Information Processing Systems, pp. 1883–1891 (2009)
Zhang, Y., Park, L.A.F., Moffat, A.: Click-based evidence for decaying weight distributions in search effectiveness metrics. J. Inf. Retrieval, 1–24 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Randeni, S., Matawie, K.M., Park, L.A.F. (2017). An Investigation into the Use of Document Scores for Optimisation over Rank-Biased Precision. In: Sung, WK., et al. Information Retrieval Technology. AIRS 2017. Lecture Notes in Computer Science(), vol 10648. Springer, Cham. https://doi.org/10.1007/978-3-319-70145-5_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-70145-5_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70144-8
Online ISBN: 978-3-319-70145-5
eBook Packages: Computer ScienceComputer Science (R0)