Exploring the space of information retrieval term scoring functions
Introduction
The quest for new, high performing IR scoring functions has been one of the main goals of IR research, ever since the beginning of the field in the late forties. This quest has led to many IR models, from the boolean model and the vector space model (Salton & McGill, 1983) to more recent proposals as the language model (Ponte & Croft, 1998) and the relevance model (Lavrenko & Croft, 2003), BM25 (Robertson & Zaragoza, 2009) and more generally probabilistic models (Jones, Walker, Robertson, 2000a, Jones, Walker, Robertson, 2000b), the HMM model (Metzler & Croft, 2005), the divergence from randomness framework (Amati & Rijsbergen, 2002) with the information-based models (Clinchant & Gaussier, 2010), and learning to rank approaches (Liu, 2009).
These models originated from the fertile imagination and thinking of scientists, who either relied on first principles, within a given theoretical framework, to derive new scoring functions, or who devised learning procedures to identify the best function in a given family of functions, typically the family of linear functions. The space of possible IR scoring functions is however tremendously larger than the one explored through such a process. Quoting Fan, Gordon, and Pathak, 2004:
There is no guarantee that existing ranking functions are the best/optimal ones available. It seems likely that more powerful functions are yet to be discovered.
The motivation of finding the best or optimal IR scoring function has led researchers to automatically explore the space of IR functions, typically through genetic programming and genetic algorithms, which were seen as a way to automatically find IR functions by exploring parts of the solution space stochastically (Cummins, O’Riordan, 2006b, Gordon, 1988, Pathak, Gordon, Fan, 2000). But such attempts have always been limited by the complexity of the search space and again only explored a portion of it, without a clear understanding on which parts were explored and which were not.
We follow here a different route that aims at a more systematic exploration of the space of IR functions. We do so by first defining a grammar to generate possible IR functions up to a certain length (the length being related to the number of elements, variables and operations, involved in a function), and second by relying on IR heuristic constraints (Fang, Tao, & Zhai, 2004) to prune the search space and filter out bad scoring functions. Such a possibility was mentioned in Cummins and O’Riordan (2006b) but had not been tried, to the best of our knowledge, before our first study presented in Goswami, Moura, Gaussier, Amini, and Maes (2014).
In addition, we perform extensive experiments on CLEF-3, TREC-3, TREC-5, TREC-6, TREC-7, TREC-8, WT10G and GOV2 to evaluate the performance of the scoring functions discovered by our search strategy. We show that these functions are simple yet effective on most of the collections and perform significantly better than other standard IR scoring functions as well as state-of-the-art genetic programming based approaches. While exploring the search space, we also compare the functions that do not satisfy the IR heuristic constraints with the ones that satisfy these constraints. As we will see, the latter set of functions significantly outperforms the former set, thus empirically validating the heuristic IR constraints used.
The current study builds upon the studies we presented in Goswami et al. (2014) and Goswami, Amini, and Gaussier (2015), and expands them in different ways: (a) we consider here functions of higher “length” (Section 4) so as to rely on a better and deeper exploration of the search space, (b) we provide here a detailed description of the framework used (Sections 4–6), (c) we show that the method is robust to the choice of the collection used to select the set of candidate scoring functions (Section 8.1), and (d) we show a complete comparison with genetic approaches (Section 8.6).
The remainder of the paper is organized as follows: Section 2 highlights the main contributions of this article; Section 3 discusses previous work and places our work with respect to the previous approaches; Section 4 introduces the function generation process and the grammar that underlies it; Section 5 describes how the function space can be formally pruned using IR heuristic constraints, while Section 6 describes the method followed to select the best performing functions from the pool of generated functions. Finally, Sections 7 and 8 describes the experiments conducted and the results obtained, while Section 9 concludes the paper.
Section snippets
Contributions
The key contributions of this article are summarized in the following points.
- (a)
An automated discovery approach for systematic exploration of IR function space. The primary contribution of this paper is the development of an automated discovery approach which can systematically explore the IR function space in order to find efficient IR scoring functions. For this, a context free grammar is defined that generates well-formed functions up to a certain length, and then intelligent strategies (such
Related work
The will to explore the space of scoring functions to discover interesting and new IR functions is not new. The first attempts were based on genetic algorithms (Goldberg, 1989) and genetic programming (Koza, 1992). Genetic algorithms are heuristic optimization strategies inspired by the principles of biological evolution. Starting with an initial population of solutions, referred to as individuals, genetic operations (as reproduction, mutation and crossover) are iteratively used to create new
Function generation
The notations we use throughout the paper are summarized in Table 1 (w represents a term).
Following Clinchant and Gaussier (2010), we retain the following general form for the score of a document given a query (denoted RSV for Retrieval Status Value): where θ is a set of parameters and a is a function of occurrences of w in the query q and is usually set to the identity function. In the remainder, the function g will be called a scoring function. Scoring
Formal pruning of the function space
The grammar defined above generates functions that are combinations of the variables and operations retained. Such combinations may however not be valid from a mathematical point of view. For example, functions involving the logarithm of a negative element may well be generated by . It is thus important to ensure that all the binary and unary operations have a correct domain of definition. The IR framework moreover imposes additional restrictions on the functions to be considered, known as
Empirical selection of the candidate scoring functions
The method described above yields candidate scoring functions that can then be tested on different IR collections to determine which ones are the most promising. However, depending on the number of functions generated, an exhaustive test on several collections may be too costly. For example, applying Algorithm 1 to generate functions till length 9 yields roughly 35,000 different candidate scoring functions. We provide in Table 3 the number of candidate functions available at each length, till
Experiments
We conducted a number of experiments aimed at validating the hypotheses at the basis of our function discovery approach and at assessing whether new, interesting IR scoring functions can be discovered.
Results
We first begin our investigation by testing the robustness of the final selected functions across different collections in use. We then validate the effectiveness of heuristic IR constraints by comparing between functions that satisfy IR constraints and those that do not. Further, we compare between valid candidate functions with classical IR models using both default and tuned parameter values. Finally, we compare the best discovered functions with those produced by two state-of-the-art
Conclusion
This study aimed at discovering new IR scoring functions. To do so, we first defined a context free grammar that generates well-formed functions up to a certain length, and then used IR heuristic constraints to prune the set of candidate functions so as to focus on promising functions only. We then empirically validated that the scoring functions satisfying IR heuristic constraints perform significantly better than valid scoring functions which do not satisfy them. Extensive testing of the
References (45)
- et al.
Improving the learning of boolean queries by means of a multiobjective IQBE evolutionary algorithm
Information Processing and Management
(2006) - et al.
A generic ranking function discovery framework by genetic programming for information retrieval
Information Processing and Management
(2004) - et al.
Learning combination weights in data fusion using genetic algorithms
Information Processing and Management
(2015) - et al.
Automated query learning with wikipedia and genetic programming
Artificial Intelligence
(2013) Crossover improvement for the genetic algorithm in information retrieval
Information Processing and Management
(1998)- et al.
A combined component approach for finding collection-adapted ranking functions based on genetic programming
Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval
(2007) - et al.
Probabilistic models of information retrieval based on measuring the divergence from randomness
ACM Transactions on Information Systems
(2002) - et al.
Improving query expansion with stemming terms: A new genetic algorithm approach
- et al.
Advances in intelligent systems and computing
(2014) - et al.
Using genetic algorithms to find suboptimal retrieval expert combinations
Proceedings of the 2002 ACM symposium on applied computing (SAC)
(2002)
Language and representation in information retrieval
Machine learning for information retrieval: Neural networks, symbolic learning, and genetic algorithms
Journal of the American Society for Information Science (JASIS)
Information-based models for ad hoc ir
Proceedings of the 33rd annual international ACM SIGIR conference on research and development in information retrieval
Retrieval constraints and word frequency distributions a log-logistic model for IR
Information Retrieval
Evolving general term-weighting schemes for information retrieval: Tests on larger collections
Artificial Intelligence Review
Evolved term-weighting schemes in information retrieval: An analysis of the solution space
Artificial Intelligence Review
Evolving local and global weighting schemes in information retrieval
Information Retrieval
Information extraction from the internet
Measuring constraint violations in information retrieval
Proceedings of the 32nd annual international ACM SIGIR conference on research and development in information retrieval
Personalization of search engine services for effective retrieval and knowledge management
Proceedings of the 21st international conference on information systems (ICIS)
A formal study of information retrieval heuristics
Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval
Diagnostic evaluation of information retrieval models
ACM Transactions on Information Systems
Cited by (14)
A secure heuristic semantic searching scheme with blockchain-based verification
2021, Information Processing and ManagementCitation Excerpt :Retrieval heuristics formalize retrieval constraints on retrieval algorithms to improve retrieval accuracy. The effectiveness of retrieval heuristics in plaintext information retrieval has been demonstrated on theoretical and experimental results in previous studies (Fang et al., 2004; Fang & Zhai, 2005, 2006; Goswami et al., 2017; Guo et al., 2016). As illustrated in Fig. 2, IRH-1 elaborates a basic heuristic for semantic searching, which requires a query word, semantically related to the document keywords, can bring correlation benefits even if the keywords do not exactly match the word.
Learn-As-You-Go: Feedback-Driven Result Ranking and Query Refinement for Interactive Data Exploration
2018, Procedia Computer ScienceMulti-perspective Approach for Curating and Exploring the History of Climate Change in Latin America within Digital Newspapers
2023, Computer Science and Information SystemsA Comparison between Term-Independence Retrieval Models for Ad Hoc Retrieval
2022, ACM Transactions on Information SystemsDesigning Formulae for Ranking Search Results: Mixed Methods Evaluation Study
2022, JMIR Human Factors