skip to main content
10.1145/3159652.3159665acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Bayesian Optimization for Optimizing Retrieval Systems

Published:02 February 2018Publication History

ABSTRACT

The effectiveness of information retrieval systems heavily depends on a large number of hyperparameters that need to be tuned. Hyperparameters range from the choice of different system components, e.g., stopword lists, stemming methods, or retrieval models, to model parameters, such as k1 and b in BM25, or the number of query expansion terms. Grid and random search, the dominant methods to search for the optimal system configuration, lack a search strategy that can guide them in the hyperparameter space. This makes them inefficient and ineffective. In this paper, we propose to use Bayesian Optimization to jointly search and optimize over the hyperparameter space. Bayesian Optimization, a sequential decision making method, suggests the next most promising configuration to be tested on the basis of the retrieval effectiveness of configurations that have been examined so far. To demonstrate the efficiency and effectiveness of Bayesian Optimization we conduct experiments on TREC collections, and show that Bayesian Optimization outperforms manual tuning, grid search and random search, both in terms of retrieval effectiveness of the configuration found, and in terms of efficiency in finding this configuration.

References

  1. Timothy G. Armstrong, Alistair Moffat, William Webber, and Justin Zobel. 2009. Improvements That Don'T Add Up: Ad-hoc Retrieval Results Since 1998. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM '09). ACM, New York, NY, USA, 601--610. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Richard E Bellman. 2015. Adaptive control processes: a guided tour. Princeton university press.Google ScholarGoogle Scholar
  3. James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. Journal of Machine Learning Research 13, Feb (2012), 281--305. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. James Bergstra, Daniel Yamins, and David D Cox. 2013. Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures. ICML (1) 28 (2013), 115--123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. James S Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. 2011. Algorithms for hyper-parameter optimization. In Advances in Neural Information Processing Systems. 2546--2554. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Anthony Bigot, Sébastien Déjean, and Josiane Mothe. 2015. Learning to Choose the Best System Configuration in Information Retrieval: the Case of Repeated Queries. Journal of Universal Computer Science 21, 13 (2015), 1726--1745.Google ScholarGoogle Scholar
  7. Russel E Caflisch, William J Morokoff, and Art B Owen. 1997. Valuation of mortgage backed securities using Brownian bridges to reduce effective dimension. Department of Mathematics, University of California, Los Angeles.Google ScholarGoogle Scholar
  8. Romain Deveaud, Josiane Mothe, and Jian-Yun Nia. 2016. Learning to Rank System Configurations. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (CIKM '16). ACM, New York, NY, USA, 2001--2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Laurence Charles Ward Dixon and Giorgio Philip Szegö. 1978. Towards global optimisation 2. North-Holland Amsterdam.Google ScholarGoogle Scholar
  10. Katharina Eggensperger, Matthias Feurer, Frank Hutter, James Bergstra, Jasper Snoek, Holger Hoos, and Kevin Leyton-Brown. 2013. Towards an empirical foundation for assessing bayesian optimization of hyperparameters. In NIPS workshop on Bayesian Optimization in Theory and Practice. 1--5.Google ScholarGoogle Scholar
  11. Nicola Ferro and Gianmaria Silvello. 2016. A General Linear Mixed Models Approach to Study System Component Effects. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '16). ACM, New York, NY, USA, 25--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Parantapa Goswami and Eric Gaussier. 2013. Estimation of the Collection Parameter of Information Models for IR. In Proceedings of the 35th European Conference on Advances in Information Retrieval (ECIR'13). Springer-Verlag, Berlin, Heidelberg, 459--470. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ben HE and Iadh Ounis. 2003. A Study of Parameter Tuning for Term Frequency Normalization. In Proceedings of the Twelfth International Conference on Information and Knowledge Management (CIKM '03). ACM, New York, NY, USA, 10--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ben He and Iadh Ounis. 2007. On Setting the Hyper-parameters of Term Frequency Normalization for Information Retrieval. ACM Trans. Inf. Syst. 25, 3, Article 13 (July 2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ben He and Iadh Ounis. 2007. Parameter Sensitivity in the Probabilistic Model for Ad-hoc Retrieval. In Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management (CIKM '07). ACM, New York, NY, USA, 263--272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Matthew W Hoffman and Bobak Shahriari. 2014. Modular mechanisms for Bayesian optimization. In NIPS Workshop on Bayesian Optimization. Citeseer.Google ScholarGoogle Scholar
  17. Frank Hutter, Holger H Hoos, and Kevin Leyton-Brown. 2011. Sequential modelbased optimization for general algorithm configuration. In International Conference on Learning and Intelligent Optimization. Springer, 507--523. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Frank Hutter and Michael A Osborne. 2013. A Kernel for Hierarchical Parameter Spaces. arXiv preprint arXiv:1310.5738 (2013).Google ScholarGoogle Scholar
  19. Sadegh Kharazmi, Falk Scholer, David Vallet, and Mark Sanderson. 2016. Examining Additivity and Weak Baselines. ACM Trans. Inf. Syst. 34, 4, Article 23 (June 2016), 18 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Yuanhua Lv and ChengXiang Zhai. 2009. A Comparative Study of Methods for Estimating Query Language Models with Pseudo Feedback. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM '09). ACM, New York, NY, USA, 1895--1898. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jonas Mockus. 1994. Application of Bayesian approach to numerical methods of global and stochastic optimization. Journal of Global Optimization 4, 4 (1994), 347--365.Google ScholarGoogle ScholarCross RefCross Ref
  22. Carl Edward Rasmussen and Christopher K. I. Williams. 2005. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. François Rousseau and Michalis Vazirgiannis. 2013. Composition of TF Normalizations: New Insights on Scoring Functions for Ad Hoc IR. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '13). ACM, New York, NY, USA, 917--920. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Jangwon Seo and W. Bruce Croft. 2010. Unsupervised Estimation of Dirichlet Smoothing Parameters. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '10). ACM, New York, NY, USA, 759--760. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando de Freitas. 2016. Taking the human out of the loop: A review of bayesian optimization. Proc. IEEE 104, 1 (2016), 148--175.Google ScholarGoogle ScholarCross RefCross Ref
  26. Jasper Snoek, Hugo Larochelle, and Ryan P Adams. 2012. Practical bayesian optimization of machine learning algorithms. In Advances in neural information processing systems. 2951--2959. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Trevor Strohman, Donald Metzler, Howard Turtle, and W. Bruce Croft. 2005. Indri: a language-model based search engine for complex queries. Technical Report. in Proceedings of the International Conference on Intelligent Analysis.Google ScholarGoogle Scholar
  28. Kevin Swersky, David Duvenaud, Jasper Snoek, Frank Hutter, and Michael A Osborne. 2014. Raiders of the lost architecture: Kernels for Bayesian optimization in conditional parameter spaces. arXiv preprint arXiv:1409.4011 (2014).Google ScholarGoogle Scholar
  29. Kevin Swersky, Jasper Snoek, and Ryan P Adams. 2013. Multi-task bayesian optimization. In Advances in neural information processing systems. 2004--2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Michael Taylor, Hugo Zaragoza, Nick Craswell, Stephen Robertson, and Chris Burges. 2006. Optimisation methods for ranking functions with multiple parameters. In Proceedings of the 15th ACM international conference on Information and knowledge management. ACM, 585--593. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Andrew Trotman, Antti Puurula, and Blake Burgess. 2014. Improvements to BM25 and Language Models Examined. In Proceedings of the 2014 Australasian Document Computing Symposium (ADCS '14). ACM, New York, NY, USA, Article 58, 8 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Christophe Van Gysel, Evangelos Kanoulas, and Maarten de Rijke. 2017. Pyndri: a Python Interface to the Indri Search Engine. In Advances in Information Retrieval: 39th European Conference on IR Research, ECIR 2017. Springer International Publishing.Google ScholarGoogle ScholarCross RefCross Ref
  33. Ellen M. Voorhees and Donna Harman (Eds.). 2001. Proceedings of The REtrieval Conference (TREC 1--9). Vol. NIST Special Publication. National Institute of Standards and Technology (NIST). http://trec.nist.gov/pubs.htmlGoogle ScholarGoogle Scholar
  34. Lidan Wang, Minwei Feng, Bowen Zhou, Bing Xiang, and Sridhar Mahadevan. 2015. Efficient hyper-parameter optimization for NLP applications. In Proceedings of EMNLP, Vol. 15. 2112--2117.Google ScholarGoogle Scholar
  35. Ziyu Wang and Nando de Freitas. 2014. Theoretical analysis of bayesian optimisation with unknown gaussian process hyper-parameters. arXiv preprint arXiv:1406.7758 (2014).Google ScholarGoogle Scholar
  36. Dani Yogatama and Noah A Smith. 2015. Bayesian optimization of text representations. arXiv preprint arXiv:1503.00693 (2015).Google ScholarGoogle Scholar
  37. Chengxiang Zhai and John Lafferty. 2001. A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '01). ACM, New York, NY, USA, 334--342. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. ChengXiang Zhai and John Lafferty. 2002. Two-stage Language Models for Information Retrieval. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '02). ACM, New York, NY, USA, 49--56. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Bayesian Optimization for Optimizing Retrieval Systems

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      WSDM '18: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining
      February 2018
      821 pages
      ISBN:9781450355810
      DOI:10.1145/3159652

      Copyright © 2018 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 2 February 2018

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      WSDM '18 Paper Acceptance Rate81of514submissions,16%Overall Acceptance Rate498of2,863submissions,17%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader