research-article

Bayesian Optimization for Optimizing Retrieval Systems

Authors:
Dan Li

University of Amsterdam, Amsterdam, Netherlands

University of Amsterdam, Amsterdam, Netherlands
View Profile

,
Evangelos Kanoulas

University of Amsterdam, Amsterdam, Netherlands

University of Amsterdam, Amsterdam, Netherlands
View Profile

WSDM '18: Proceedings of the Eleventh ACM International Conference on Web Search and Data MiningFebruary 2018Pages 360–368https://doi.org/10.1145/3159652.3159665

Published:02 February 2018Publication History

WSDM '18: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining

Pages 360–368

ABSTRACT

The effectiveness of information retrieval systems heavily depends on a large number of hyperparameters that need to be tuned. Hyperparameters range from the choice of different system components, e.g., stopword lists, stemming methods, or retrieval models, to model parameters, such as k1 and b in BM25, or the number of query expansion terms. Grid and random search, the dominant methods to search for the optimal system configuration, lack a search strategy that can guide them in the hyperparameter space. This makes them inefficient and ineffective. In this paper, we propose to use Bayesian Optimization to jointly search and optimize over the hyperparameter space. Bayesian Optimization, a sequential decision making method, suggests the next most promising configuration to be tested on the basis of the retrieval effectiveness of configurations that have been examined so far. To demonstrate the efficiency and effectiveness of Bayesian Optimization we conduct experiments on TREC collections, and show that Bayesian Optimization outperforms manual tuning, grid search and random search, both in terms of retrieval effectiveness of the configuration found, and in terms of efficiency in finding this configuration.

References

Timothy G. Armstrong, Alistair Moffat, William Webber, and Justin Zobel. 2009. Improvements That Don'T Add Up: Ad-hoc Retrieval Results Since 1998. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM '09). ACM, New York, NY, USA, 601--610. Google ScholarDigital Library
Richard E Bellman. 2015. Adaptive control processes: a guided tour. Princeton university press.Google Scholar
James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. Journal of Machine Learning Research 13, Feb (2012), 281--305. Google ScholarDigital Library
James Bergstra, Daniel Yamins, and David D Cox. 2013. Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures. ICML (1) 28 (2013), 115--123. Google ScholarDigital Library
James S Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. 2011. Algorithms for hyper-parameter optimization. In Advances in Neural Information Processing Systems. 2546--2554. Google ScholarDigital Library
Anthony Bigot, Sébastien Déjean, and Josiane Mothe. 2015. Learning to Choose the Best System Configuration in Information Retrieval: the Case of Repeated Queries. Journal of Universal Computer Science 21, 13 (2015), 1726--1745.Google Scholar
Russel E Caflisch, William J Morokoff, and Art B Owen. 1997. Valuation of mortgage backed securities using Brownian bridges to reduce effective dimension. Department of Mathematics, University of California, Los Angeles.Google Scholar
Romain Deveaud, Josiane Mothe, and Jian-Yun Nia. 2016. Learning to Rank System Configurations. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (CIKM '16). ACM, New York, NY, USA, 2001--2004. Google ScholarDigital Library
Laurence Charles Ward Dixon and Giorgio Philip Szegö. 1978. Towards global optimisation 2. North-Holland Amsterdam.Google Scholar
Katharina Eggensperger, Matthias Feurer, Frank Hutter, James Bergstra, Jasper Snoek, Holger Hoos, and Kevin Leyton-Brown. 2013. Towards an empirical foundation for assessing bayesian optimization of hyperparameters. In NIPS workshop on Bayesian Optimization in Theory and Practice. 1--5.Google Scholar
Nicola Ferro and Gianmaria Silvello. 2016. A General Linear Mixed Models Approach to Study System Component Effects. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '16). ACM, New York, NY, USA, 25--34. Google ScholarDigital Library
Parantapa Goswami and Eric Gaussier. 2013. Estimation of the Collection Parameter of Information Models for IR. In Proceedings of the 35th European Conference on Advances in Information Retrieval (ECIR'13). Springer-Verlag, Berlin, Heidelberg, 459--470. Google ScholarDigital Library
Ben HE and Iadh Ounis. 2003. A Study of Parameter Tuning for Term Frequency Normalization. In Proceedings of the Twelfth International Conference on Information and Knowledge Management (CIKM '03). ACM, New York, NY, USA, 10--16. Google ScholarDigital Library
Ben He and Iadh Ounis. 2007. On Setting the Hyper-parameters of Term Frequency Normalization for Information Retrieval. ACM Trans. Inf. Syst. 25, 3, Article 13 (July 2007). Google ScholarDigital Library
Ben He and Iadh Ounis. 2007. Parameter Sensitivity in the Probabilistic Model for Ad-hoc Retrieval. In Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management (CIKM '07). ACM, New York, NY, USA, 263--272. Google ScholarDigital Library
Matthew W Hoffman and Bobak Shahriari. 2014. Modular mechanisms for Bayesian optimization. In NIPS Workshop on Bayesian Optimization. Citeseer.Google Scholar
Frank Hutter, Holger H Hoos, and Kevin Leyton-Brown. 2011. Sequential modelbased optimization for general algorithm configuration. In International Conference on Learning and Intelligent Optimization. Springer, 507--523. Google ScholarDigital Library
Frank Hutter and Michael A Osborne. 2013. A Kernel for Hierarchical Parameter Spaces. arXiv preprint arXiv:1310.5738 (2013).Google Scholar
Sadegh Kharazmi, Falk Scholer, David Vallet, and Mark Sanderson. 2016. Examining Additivity and Weak Baselines. ACM Trans. Inf. Syst. 34, 4, Article 23 (June 2016), 18 pages. Google ScholarDigital Library
Yuanhua Lv and ChengXiang Zhai. 2009. A Comparative Study of Methods for Estimating Query Language Models with Pseudo Feedback. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM '09). ACM, New York, NY, USA, 1895--1898. Google ScholarDigital Library
Jonas Mockus. 1994. Application of Bayesian approach to numerical methods of global and stochastic optimization. Journal of Global Optimization 4, 4 (1994), 347--365.Google ScholarCross Ref
Carl Edward Rasmussen and Christopher K. I. Williams. 2005. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press. Google ScholarDigital Library
François Rousseau and Michalis Vazirgiannis. 2013. Composition of TF Normalizations: New Insights on Scoring Functions for Ad Hoc IR. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '13). ACM, New York, NY, USA, 917--920. Google ScholarDigital Library
Jangwon Seo and W. Bruce Croft. 2010. Unsupervised Estimation of Dirichlet Smoothing Parameters. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '10). ACM, New York, NY, USA, 759--760. Google ScholarDigital Library
Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando de Freitas. 2016. Taking the human out of the loop: A review of bayesian optimization. Proc. IEEE 104, 1 (2016), 148--175.Google ScholarCross Ref
Jasper Snoek, Hugo Larochelle, and Ryan P Adams. 2012. Practical bayesian optimization of machine learning algorithms. In Advances in neural information processing systems. 2951--2959. Google ScholarDigital Library
Trevor Strohman, Donald Metzler, Howard Turtle, and W. Bruce Croft. 2005. Indri: a language-model based search engine for complex queries. Technical Report. in Proceedings of the International Conference on Intelligent Analysis.Google Scholar
Kevin Swersky, David Duvenaud, Jasper Snoek, Frank Hutter, and Michael A Osborne. 2014. Raiders of the lost architecture: Kernels for Bayesian optimization in conditional parameter spaces. arXiv preprint arXiv:1409.4011 (2014).Google Scholar
Kevin Swersky, Jasper Snoek, and Ryan P Adams. 2013. Multi-task bayesian optimization. In Advances in neural information processing systems. 2004--2012. Google ScholarDigital Library
Michael Taylor, Hugo Zaragoza, Nick Craswell, Stephen Robertson, and Chris Burges. 2006. Optimisation methods for ranking functions with multiple parameters. In Proceedings of the 15th ACM international conference on Information and knowledge management. ACM, 585--593. Google ScholarDigital Library
Andrew Trotman, Antti Puurula, and Blake Burgess. 2014. Improvements to BM25 and Language Models Examined. In Proceedings of the 2014 Australasian Document Computing Symposium (ADCS '14). ACM, New York, NY, USA, Article 58, 8 pages. Google ScholarDigital Library
Christophe Van Gysel, Evangelos Kanoulas, and Maarten de Rijke. 2017. Pyndri: a Python Interface to the Indri Search Engine. In Advances in Information Retrieval: 39th European Conference on IR Research, ECIR 2017. Springer International Publishing.Google ScholarCross Ref
Ellen M. Voorhees and Donna Harman (Eds.). 2001. Proceedings of The REtrieval Conference (TREC 1--9). Vol. NIST Special Publication. National Institute of Standards and Technology (NIST). http://trec.nist.gov/pubs.htmlGoogle Scholar
Lidan Wang, Minwei Feng, Bowen Zhou, Bing Xiang, and Sridhar Mahadevan. 2015. Efficient hyper-parameter optimization for NLP applications. In Proceedings of EMNLP, Vol. 15. 2112--2117.Google Scholar
Ziyu Wang and Nando de Freitas. 2014. Theoretical analysis of bayesian optimisation with unknown gaussian process hyper-parameters. arXiv preprint arXiv:1406.7758 (2014).Google Scholar
Dani Yogatama and Noah A Smith. 2015. Bayesian optimization of text representations. arXiv preprint arXiv:1503.00693 (2015).Google Scholar
Chengxiang Zhai and John Lafferty. 2001. A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '01). ACM, New York, NY, USA, 334--342. Google ScholarDigital Library
ChengXiang Zhai and John Lafferty. 2002. Two-stage Language Models for Information Retrieval. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '02). ACM, New York, NY, USA, 49--56. Google ScholarDigital Library

Index Terms

Bayesian Optimization for Optimizing Retrieval Systems
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Bayesian information retrieval
Read More
Accounting for Gaussian Process Imprecision in Bayesian Optimization
Integrated Uncertainty in Knowledge Modelling and Decision Making
Abstract
Bayesian optimization (BO) with Gaussian processes (GP) as surrogate models is widely used to optimize analytically unknown and expensive-to-evaluate functions. In this paper, we propose Prior-mean-RObust Bayesian Optimization (PROBO) that ...
Read More
The study on content based multimedia data retrieval system

Of late, advance in hardware and communications technology has been rapidly increasing the demand for diverse multimedia information, which, including all image, audio, video, text, numerical data, etc., should be designed to excel the existing ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WSDM '18: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining
February 2018
821 pages
ISBN:9781450355810
DOI:10.1145/3159652
General Chairs:
Yi Chang
Jilin University, Huawei Inc.
,
Chengxiang Zhai
University of Illinois Urbana-Champaign
,
Program Chairs:
Yan Liu
University of Southern California
,
Yoelle Maarek
Amazon
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 February 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
bayesian optimization
covariance function
hyperparameter optimisation
retrieval system
Qualifiers
- research-article
Conference

Acceptance Rates
WSDM '18 Paper Acceptance Rate81of514submissions,16%Overall Acceptance Rate498of2,863submissions,17%
More
Upcoming Conference
WSDM '25

Sponsor:

sigir

sigir

sigir

sigir

The Eighteenth ACM International Conference on Web Search and Data Mining

April 7 - 11, 2025

Hannover , Germany
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 368
  Total Downloads
- Downloads (Last 12 months)41
- Downloads (Last 6 weeks)16
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Bayesian Optimization for Optimizing Retrieval Systems

WSDM '18: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Bayesian information retrieval

Accounting for Gaussian Process Imprecision in Bayesian Optimization

The study on content based multimedia data retrieval system