Exploring the space of information retrieval term scoring functions

https://doi.org/10.1016/j.ipm.2016.11.003Get rights and content

Highlights

  • A novel automated discovery approach to systematically explore IR function space.

  • Empirical analysis of heuristic IR constraints in light of the new discovery approach.

  • Experimental validation of effectiveness of discovered IR scoring functions.

Abstract

In this paper we are interested in finding good IR scoring functions by exploring the space of all possible IR functions. Earlier approaches to do so however only explore a small sub-part of the space, with no control on which part is explored and which is not. We aim here at a more systematic exploration by first defining a grammar to generate possible IR functions up to a certain length (the length being related to the number of elements, variables and operations, involved in a function), and second by relying on IR heuristic constraints to prune the search space and filter out bad scoring functions. The obtained candidate scoring functions are tested on various standard IR collections and several simple but promising functions are identified. We perform extensive experiments to compare these functions with classical IR models. It is observed that these functions are yielding either better or comparable results. We also compare the performance of functions satisfying IR heuristic constraints and those which do not; the former set of functions clearly outperforms the latter, which shows the validity of IR heuristic constraints to design new IR models.

Introduction

The quest for new, high performing IR scoring functions has been one of the main goals of IR research, ever since the beginning of the field in the late forties. This quest has led to many IR models, from the boolean model and the vector space model (Salton & McGill, 1983) to more recent proposals as the language model (Ponte & Croft, 1998) and the relevance model (Lavrenko & Croft, 2003), BM25 (Robertson & Zaragoza, 2009) and more generally probabilistic models (Jones, Walker, Robertson, 2000a, Jones, Walker, Robertson, 2000b), the HMM model (Metzler & Croft, 2005), the divergence from randomness framework (Amati & Rijsbergen, 2002) with the information-based models (Clinchant & Gaussier, 2010), and learning to rank approaches (Liu, 2009).

These models originated from the fertile imagination and thinking of scientists, who either relied on first principles, within a given theoretical framework, to derive new scoring functions, or who devised learning procedures to identify the best function in a given family of functions, typically the family of linear functions. The space of possible IR scoring functions is however tremendously larger than the one explored through such a process. Quoting Fan, Gordon, and Pathak, 2004:

There is no guarantee that existing ranking functions are the best/optimal ones available. It seems likely that more powerful functions are yet to be discovered.

The motivation of finding the best or optimal IR scoring function has led researchers to automatically explore the space of IR functions, typically through genetic programming and genetic algorithms, which were seen as a way to automatically find IR functions by exploring parts of the solution space stochastically (Cummins, O’Riordan, 2006b, Gordon, 1988, Pathak, Gordon, Fan, 2000). But such attempts have always been limited by the complexity of the search space and again only explored a portion of it, without a clear understanding on which parts were explored and which were not.

We follow here a different route that aims at a more systematic exploration of the space of IR functions. We do so by first defining a grammar to generate possible IR functions up to a certain length (the length being related to the number of elements, variables and operations, involved in a function), and second by relying on IR heuristic constraints (Fang, Tao, & Zhai, 2004) to prune the search space and filter out bad scoring functions. Such a possibility was mentioned in Cummins and O’Riordan (2006b) but had not been tried, to the best of our knowledge, before our first study presented in Goswami, Moura, Gaussier, Amini, and Maes (2014).

In addition, we perform extensive experiments on CLEF-3, TREC-3, TREC-5, TREC-6, TREC-7, TREC-8, WT10G and GOV2 to evaluate the performance of the scoring functions discovered by our search strategy. We show that these functions are simple yet effective on most of the collections and perform significantly better than other standard IR scoring functions as well as state-of-the-art genetic programming based approaches. While exploring the search space, we also compare the functions that do not satisfy the IR heuristic constraints with the ones that satisfy these constraints. As we will see, the latter set of functions significantly outperforms the former set, thus empirically validating the heuristic IR constraints used.

The current study builds upon the studies we presented in Goswami et al. (2014) and Goswami, Amini, and Gaussier (2015), and expands them in different ways: (a) we consider here functions of higher “length” (Section 4) so as to rely on a better and deeper exploration of the search space, (b) we provide here a detailed description of the framework used (Sections 4–6), (c) we show that the method is robust to the choice of the collection used to select the set of candidate scoring functions (Section 8.1), and (d) we show a complete comparison with genetic approaches (Section 8.6).

The remainder of the paper is organized as follows: Section 2 highlights the main contributions of this article; Section 3 discusses previous work and places our work with respect to the previous approaches; Section 4 introduces the function generation process and the grammar that underlies it; Section 5 describes how the function space can be formally pruned using IR heuristic constraints, while Section 6 describes the method followed to select the best performing functions from the pool of generated functions. Finally, Sections 7 and 8 describes the experiments conducted and the results obtained, while Section 9 concludes the paper.

Section snippets

Contributions

The key contributions of this article are summarized in the following points.

  • (a)

    An automated discovery approach for systematic exploration of IR function space. The primary contribution of this paper is the development of an automated discovery approach which can systematically explore the IR function space in order to find efficient IR scoring functions. For this, a context free grammar is defined that generates well-formed functions up to a certain length, and then intelligent strategies (such

Related work

The will to explore the space of scoring functions to discover interesting and new IR functions is not new. The first attempts were based on genetic algorithms (Goldberg, 1989) and genetic programming (Koza, 1992). Genetic algorithms are heuristic optimization strategies inspired by the principles of biological evolution. Starting with an initial population of solutions, referred to as individuals, genetic operations (as reproduction, mutation and crossover) are iteratively used to create new

Function generation

The notations we use throughout the paper are summarized in Table 1 (w represents a term).

Following Clinchant and Gaussier (2010), we retain the following general form for the score of a document given a query (denoted RSV for Retrieval Status Value): RSV(q,d)=wqa(twq)g(twd,Nw,ld,θ)where θ is a set of parameters and a is a function of occurrences of w in the query q and is usually set to the identity function. In the remainder, the function g will be called a scoring function. Scoring

Formal pruning of the function space

The grammar G defined above generates functions that are combinations of the variables and operations retained. Such combinations may however not be valid from a mathematical point of view. For example, functions involving the logarithm of a negative element may well be generated by G. It is thus important to ensure that all the binary and unary operations have a correct domain of definition. The IR framework moreover imposes additional restrictions on the functions to be considered, known as

Empirical selection of the candidate scoring functions

The method described above yields candidate scoring functions that can then be tested on different IR collections to determine which ones are the most promising. However, depending on the number of functions generated, an exhaustive test on several collections may be too costly. For example, applying Algorithm 1 to generate functions till length 9 yields roughly 35,000 different candidate scoring functions. We provide in Table 3 the number of candidate functions available at each length, till

Experiments

We conducted a number of experiments aimed at validating the hypotheses at the basis of our function discovery approach and at assessing whether new, interesting IR scoring functions can be discovered.

Results

We first begin our investigation by testing the robustness of the final selected functions across different collections in use. We then validate the effectiveness of heuristic IR constraints by comparing between functions that satisfy IR constraints and those that do not. Further, we compare between valid candidate functions with classical IR models using both default and tuned parameter values. Finally, we compare the best discovered functions with those produced by two state-of-the-art

Conclusion

This study aimed at discovering new IR scoring functions. To do so, we first defined a context free grammar that generates well-formed functions up to a certain length, and then used IR heuristic constraints to prune the set of candidate functions so as to focus on promising functions only. We then empirically validated that the scoring functions satisfying IR heuristic constraints perform significantly better than valid scoring functions which do not satisfy them. Extensive testing of the

References (45)

  • D. Blair

    Language and representation in information retrieval

    (1990)
  • ChenH.

    Machine learning for information retrieval: Neural networks, symbolic learning, and genetic algorithms

    Journal of the American Society for Information Science (JASIS)

    (1995)
  • S. Clinchant et al.

    Information-based models for ad hoc ir

    Proceedings of the 33rd annual international ACM SIGIR conference on research and development in information retrieval

    (2010)
  • S. Clinchant et al.

    Retrieval constraints and word frequency distributions a log-logistic model for IR

    Information Retrieval

    (2011)
  • R. Cummins et al.

    Evolving general term-weighting schemes for information retrieval: Tests on larger collections

    Artificial Intelligence Review

    (2005)
  • R. Cummins et al.

    Evolved term-weighting schemes in information retrieval: An analysis of the solution space

    Artificial Intelligence Review

    (2006)
  • R. Cummins et al.

    Evolving local and global weighting schemes in information retrieval

    Information Retrieval

    (2006)
  • R. Cummins et al.

    Information extraction from the internet

    (2009)
  • R. Cummins et al.

    Measuring constraint violations in information retrieval

    Proceedings of the 32nd annual international ACM SIGIR conference on research and development in information retrieval

    (2009)
  • W. Fan et al.

    Personalization of search engine services for effective retrieval and knowledge management

    Proceedings of the 21st international conference on information systems (ICIS)

    (2000)
  • FangH. et al.

    A formal study of information retrieval heuristics

    Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval

    (2004)
  • FangH. et al.

    Diagnostic evaluation of information retrieval models

    ACM Transactions on Information Systems

    (2011)
  • Cited by (14)

    • A secure heuristic semantic searching scheme with blockchain-based verification

      2021, Information Processing and Management
      Citation Excerpt :

      Retrieval heuristics formalize retrieval constraints on retrieval algorithms to improve retrieval accuracy. The effectiveness of retrieval heuristics in plaintext information retrieval has been demonstrated on theoretical and experimental results in previous studies (Fang et al., 2004; Fang & Zhai, 2005, 2006; Goswami et al., 2017; Guo et al., 2016). As illustrated in Fig. 2, IRH-1 elaborates a basic heuristic for semantic searching, which requires a query word, semantically related to the document keywords, can bring correlation benefits even if the keywords do not exactly match the word.

    View all citing articles on Scopus
    View full text