Regular article
Relevance judgements for assessing recall

https://doi.org/10.1016/0306-4573(95)00061-5Get rights and content

Abstract

Recall and Precision have become the principle measures of the effectiveness of information retrieval systems. Inherent in these measures of performance is the idea of a relevant document. Although recall and precision are easily and unambiguously defined, selecting the documents relevant to a query has long been recognized as problematic. To compare performance of different systems, standard collections of documents, queries, and relevance judgments have been used. Unfortunately the standard collections, such as SMART and TREC, have locked in a particular approach to relevance that is suitable for assessing precision but not recall. The problem is demonstrated by comparing two information retrieval methods over several queries, and showing how a new method of forming relevance judgments that a suitable for assessing recall gives different results. Recall is an interesting and practical issue, but current test procedures are inadequate for measuring it.

References (21)

  • W.S. Cooper

    A definition of relevance for information retrieval

    Information Storage and Retrieval

    (1971)
  • C.J. Crouch

    An approach to the automatic construction of global thesauri

    Information Processing & Management

    (1990)
  • H.P. Frei et al.

    Determining the effectiveness of retrieval algorithms

    Information Processing & Management

    (1991)
  • C. Buckley et al.

    The SMART information retrieval system, version 8.8. Fetched from ftp site ftp.cs.cornell.edu

    (1988)
  • C.W. Cleverdon

    User evaluation of information retrieval systems

    Journal of Documentation

    (1974)
  • S. Deerwester et al.

    Indexing by latent semantic analysis

    Journal of the American Society for Information Science

    (1990)
  • A. El-hamdouch et al.

    Comparison of hierarchic agglomerative clustering methods for document retrieval

    The Computer Journal

    (1989)
  • J.L. Fagan

    Experiments in automatic phrase indexing for document retrieval: A comparison of syntactic and non-syntactic methods

  • S.I. Gallant et al.

    HNC's MatchPlus system

    SIGIR Forum

    (1992)
  • D. Harman

    The DARPA TIPSTER project

    SIGIR Forum

    (1992)
There are more references available in the full text version of this article.

Cited by (27)

  • Usage-based chunking of Software Architecture information to assist information finding

    2016, Journal of Systems and Software
    Citation Excerpt :

    The researcher who defined the tasks was one of the judges who constructed the relevance judgments sets for the tasks. This follows the use of the person who formulated the query to build the relevance judgments set in Information Retrieval (Wallis and Thom, 1996). The oracle set for each task can be found in the table that shows the composition of chunks found for the respective task.

  • Evaluating epistemic uncertainty under incomplete assessments

    2008, Information Processing and Management
  • An algorithm to cluster documents based on relevance

    2005, Information Processing and Management
View all citing articles on Scopus
View full text