ABSTRACT
The effectiveness of information retrieval systems heavily depends on a large number of hyperparameters that need to be tuned. Hyperparameters range from the choice of different system components, e.g., stopword lists, stemming methods, or retrieval models, to model parameters, such as k1 and b in BM25, or the number of query expansion terms. Grid and random search, the dominant methods to search for the optimal system configuration, lack a search strategy that can guide them in the hyperparameter space. This makes them inefficient and ineffective. In this paper, we propose to use Bayesian Optimization to jointly search and optimize over the hyperparameter space. Bayesian Optimization, a sequential decision making method, suggests the next most promising configuration to be tested on the basis of the retrieval effectiveness of configurations that have been examined so far. To demonstrate the efficiency and effectiveness of Bayesian Optimization we conduct experiments on TREC collections, and show that Bayesian Optimization outperforms manual tuning, grid search and random search, both in terms of retrieval effectiveness of the configuration found, and in terms of efficiency in finding this configuration.
- Timothy G. Armstrong, Alistair Moffat, William Webber, and Justin Zobel. 2009. Improvements That Don'T Add Up: Ad-hoc Retrieval Results Since 1998. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM '09). ACM, New York, NY, USA, 601--610. Google ScholarDigital Library
- Richard E Bellman. 2015. Adaptive control processes: a guided tour. Princeton university press.Google Scholar
- James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. Journal of Machine Learning Research 13, Feb (2012), 281--305. Google ScholarDigital Library
- James Bergstra, Daniel Yamins, and David D Cox. 2013. Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures. ICML (1) 28 (2013), 115--123. Google ScholarDigital Library
- James S Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. 2011. Algorithms for hyper-parameter optimization. In Advances in Neural Information Processing Systems. 2546--2554. Google ScholarDigital Library
- Anthony Bigot, Sébastien Déjean, and Josiane Mothe. 2015. Learning to Choose the Best System Configuration in Information Retrieval: the Case of Repeated Queries. Journal of Universal Computer Science 21, 13 (2015), 1726--1745.Google Scholar
- Russel E Caflisch, William J Morokoff, and Art B Owen. 1997. Valuation of mortgage backed securities using Brownian bridges to reduce effective dimension. Department of Mathematics, University of California, Los Angeles.Google Scholar
- Romain Deveaud, Josiane Mothe, and Jian-Yun Nia. 2016. Learning to Rank System Configurations. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (CIKM '16). ACM, New York, NY, USA, 2001--2004. Google ScholarDigital Library
- Laurence Charles Ward Dixon and Giorgio Philip Szegö. 1978. Towards global optimisation 2. North-Holland Amsterdam.Google Scholar
- Katharina Eggensperger, Matthias Feurer, Frank Hutter, James Bergstra, Jasper Snoek, Holger Hoos, and Kevin Leyton-Brown. 2013. Towards an empirical foundation for assessing bayesian optimization of hyperparameters. In NIPS workshop on Bayesian Optimization in Theory and Practice. 1--5.Google Scholar
- Nicola Ferro and Gianmaria Silvello. 2016. A General Linear Mixed Models Approach to Study System Component Effects. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '16). ACM, New York, NY, USA, 25--34. Google ScholarDigital Library
- Parantapa Goswami and Eric Gaussier. 2013. Estimation of the Collection Parameter of Information Models for IR. In Proceedings of the 35th European Conference on Advances in Information Retrieval (ECIR'13). Springer-Verlag, Berlin, Heidelberg, 459--470. Google ScholarDigital Library
- Ben HE and Iadh Ounis. 2003. A Study of Parameter Tuning for Term Frequency Normalization. In Proceedings of the Twelfth International Conference on Information and Knowledge Management (CIKM '03). ACM, New York, NY, USA, 10--16. Google ScholarDigital Library
- Ben He and Iadh Ounis. 2007. On Setting the Hyper-parameters of Term Frequency Normalization for Information Retrieval. ACM Trans. Inf. Syst. 25, 3, Article 13 (July 2007). Google ScholarDigital Library
- Ben He and Iadh Ounis. 2007. Parameter Sensitivity in the Probabilistic Model for Ad-hoc Retrieval. In Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management (CIKM '07). ACM, New York, NY, USA, 263--272. Google ScholarDigital Library
- Matthew W Hoffman and Bobak Shahriari. 2014. Modular mechanisms for Bayesian optimization. In NIPS Workshop on Bayesian Optimization. Citeseer.Google Scholar
- Frank Hutter, Holger H Hoos, and Kevin Leyton-Brown. 2011. Sequential modelbased optimization for general algorithm configuration. In International Conference on Learning and Intelligent Optimization. Springer, 507--523. Google ScholarDigital Library
- Frank Hutter and Michael A Osborne. 2013. A Kernel for Hierarchical Parameter Spaces. arXiv preprint arXiv:1310.5738 (2013).Google Scholar
- Sadegh Kharazmi, Falk Scholer, David Vallet, and Mark Sanderson. 2016. Examining Additivity and Weak Baselines. ACM Trans. Inf. Syst. 34, 4, Article 23 (June 2016), 18 pages. Google ScholarDigital Library
- Yuanhua Lv and ChengXiang Zhai. 2009. A Comparative Study of Methods for Estimating Query Language Models with Pseudo Feedback. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM '09). ACM, New York, NY, USA, 1895--1898. Google ScholarDigital Library
- Jonas Mockus. 1994. Application of Bayesian approach to numerical methods of global and stochastic optimization. Journal of Global Optimization 4, 4 (1994), 347--365.Google ScholarCross Ref
- Carl Edward Rasmussen and Christopher K. I. Williams. 2005. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press. Google ScholarDigital Library
- François Rousseau and Michalis Vazirgiannis. 2013. Composition of TF Normalizations: New Insights on Scoring Functions for Ad Hoc IR. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '13). ACM, New York, NY, USA, 917--920. Google ScholarDigital Library
- Jangwon Seo and W. Bruce Croft. 2010. Unsupervised Estimation of Dirichlet Smoothing Parameters. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '10). ACM, New York, NY, USA, 759--760. Google ScholarDigital Library
- Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando de Freitas. 2016. Taking the human out of the loop: A review of bayesian optimization. Proc. IEEE 104, 1 (2016), 148--175.Google ScholarCross Ref
- Jasper Snoek, Hugo Larochelle, and Ryan P Adams. 2012. Practical bayesian optimization of machine learning algorithms. In Advances in neural information processing systems. 2951--2959. Google ScholarDigital Library
- Trevor Strohman, Donald Metzler, Howard Turtle, and W. Bruce Croft. 2005. Indri: a language-model based search engine for complex queries. Technical Report. in Proceedings of the International Conference on Intelligent Analysis.Google Scholar
- Kevin Swersky, David Duvenaud, Jasper Snoek, Frank Hutter, and Michael A Osborne. 2014. Raiders of the lost architecture: Kernels for Bayesian optimization in conditional parameter spaces. arXiv preprint arXiv:1409.4011 (2014).Google Scholar
- Kevin Swersky, Jasper Snoek, and Ryan P Adams. 2013. Multi-task bayesian optimization. In Advances in neural information processing systems. 2004--2012. Google ScholarDigital Library
- Michael Taylor, Hugo Zaragoza, Nick Craswell, Stephen Robertson, and Chris Burges. 2006. Optimisation methods for ranking functions with multiple parameters. In Proceedings of the 15th ACM international conference on Information and knowledge management. ACM, 585--593. Google ScholarDigital Library
- Andrew Trotman, Antti Puurula, and Blake Burgess. 2014. Improvements to BM25 and Language Models Examined. In Proceedings of the 2014 Australasian Document Computing Symposium (ADCS '14). ACM, New York, NY, USA, Article 58, 8 pages. Google ScholarDigital Library
- Christophe Van Gysel, Evangelos Kanoulas, and Maarten de Rijke. 2017. Pyndri: a Python Interface to the Indri Search Engine. In Advances in Information Retrieval: 39th European Conference on IR Research, ECIR 2017. Springer International Publishing.Google ScholarCross Ref
- Ellen M. Voorhees and Donna Harman (Eds.). 2001. Proceedings of The REtrieval Conference (TREC 1--9). Vol. NIST Special Publication. National Institute of Standards and Technology (NIST). http://trec.nist.gov/pubs.htmlGoogle Scholar
- Lidan Wang, Minwei Feng, Bowen Zhou, Bing Xiang, and Sridhar Mahadevan. 2015. Efficient hyper-parameter optimization for NLP applications. In Proceedings of EMNLP, Vol. 15. 2112--2117.Google Scholar
- Ziyu Wang and Nando de Freitas. 2014. Theoretical analysis of bayesian optimisation with unknown gaussian process hyper-parameters. arXiv preprint arXiv:1406.7758 (2014).Google Scholar
- Dani Yogatama and Noah A Smith. 2015. Bayesian optimization of text representations. arXiv preprint arXiv:1503.00693 (2015).Google Scholar
- Chengxiang Zhai and John Lafferty. 2001. A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '01). ACM, New York, NY, USA, 334--342. Google ScholarDigital Library
- ChengXiang Zhai and John Lafferty. 2002. Two-stage Language Models for Information Retrieval. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '02). ACM, New York, NY, USA, 49--56. Google ScholarDigital Library
Index Terms
- Bayesian Optimization for Optimizing Retrieval Systems
Recommendations
Accounting for Gaussian Process Imprecision in Bayesian Optimization
Integrated Uncertainty in Knowledge Modelling and Decision MakingAbstractBayesian optimization (BO) with Gaussian processes (GP) as surrogate models is widely used to optimize analytically unknown and expensive-to-evaluate functions. In this paper, we propose Prior-mean-RObust Bayesian Optimization (PROBO) that ...
The study on content based multimedia data retrieval system
Of late, advance in hardware and communications technology has been rapidly increasing the demand for diverse multimedia information, which, including all image, audio, video, text, numerical data, etc., should be designed to excel the existing ...
Comments