skip to main content
research-article

Reproduce and Improve: An Evolutionary Approach to Select a Few Good Topics for Information Retrieval Evaluation

Published:29 September 2018Publication History
Skip Abstract Section

Abstract

Effectiveness evaluation of information retrieval systems by means of a test collection is a widely used methodology. However, it is rather expensive in terms of resources, time, and money; therefore, many researchers have proposed methods for a cheaper evaluation. One particular approach, on which we focus in this article, is to use fewer topics: in TREC-like initiatives, usually system effectiveness is evaluated as the average effectiveness on a set of n topics (usually, n=50, but more than 1,000 have been also adopted); instead of using the full set, it has been proposed to find the best subsets of a few good topics that evaluate the systems in the most similar way to the full set. The computational complexity of the task has so far limited the analysis that has been performed. We develop a novel and efficient approach based on a multi-objective evolutionary algorithm. The higher efficiency of our new implementation allows us to reproduce some notable results on topic set reduction, as well as perform new experiments to generalize and improve such results. We show that our approach is able to both reproduce the main state-of-the-art results and to allow us to analyze the effect of the collection, metric, and pool depth used for the evaluation. Finally, differently from previous studies, which have been mainly theoretical, we are also able to discuss some practical topic selection strategies, integrating results of automatic evaluation approaches.

References

  1. James Allan, Ben Carterette, Javed A. Aslam, Virgil Pavlu, Blagovest Dachev, and Evangelos Kanoulas. 2007. Million Query Track 2007 Overview. Technical Report. NIST. http://trec.nist.gov/pubs/trec18/papers/MQ09OVERVIEW.pdf.Google ScholarGoogle Scholar
  2. David Banks, Paul Over, and Nien-Fan Zhang. 1999. Blind men and elephants: Six approaches to TREC data. Information Retrieval 1, 1 (1999), 7--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Andrea Berto, Stefano Mizzaro, and Stephen Robertson. 2013. On using fewer topics in information retrieval evaluations. In Proceedings of the 2013 Conference on the Theory of Information Retrieval (ICTIR’13). ACM, New York, NY, Article 9, 8 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Chris Buckley and Ellen M. Voorhees. 2000. Evaluating evaluation measure stability. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’00). ACM, New York, NY, 33--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and T. Meyarivan. 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6, 2 (2002), 182--197. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Agoston E. Eiben and J. E. Smith. 2003. Introduction to Evolutionary Computing. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Nicola Ferro. 2017. Reproducibility challenges in information retrieval evaluation. Journal of Data and Information Quality 8, 2, Article 8 (Jan. 2017), 4 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Nicola Ferro, Norbert Fuhr, Kalervo Järvelin, Noriko Kando, Matthias Lippold, and Justin Zobel. 2016. Increasing reproducibility in IR: Findings from the Dagstuhl seminar on reproducibility of data-oriented experiments in E-science. In ACM SIGIR Forum, Vol. 50. ACM, 68--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. John Guiver, Stefano Mizzaro, and Stephen Robertson. 2009. A few good topics: Experiments in topic set reduction for retrieval evaluation. ACM Transactions on Information Systems 27, 4, Article 21 (Nov. 2009), 26 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jon M. Kleinberg. 1999. Authoritative sources in a hyperlinked environment. Journal of the ACM 46, 5 (Sept. 1999), 604--632. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Xiaolu Lu, Alistair Moffat, and J. Shane Culpepper. 2016. The effect of pooling and evaluation depth on IR metrics. Information Retrieval 19, 4 (Aug. 2016), 416--445. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Stefano Mizzaro and Stephen Robertson. 2007. Hits hits TREC: Exploring IR evaluation results with network analysis. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’07). ACM, New York, NY, 479--486. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Stephen Robertson. 2011. On the contributions of topics to system evaluation. In Proceedings of the 33rd European Conference on Advances in Information Retrieval, Volume 6611 (ECIR’11). Springer-Verlag, New York, 129--140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kevin Roitero, Eddy Maddalena, and Stefano Mizzaro. 2017. Do Easy Topics Predict Effectiveness Better Than Difficult Topics? Springer International Publishing, Cham, 605--611.Google ScholarGoogle Scholar
  15. Tetsuya Sakai. 2014. Designing test collections for comparing many systems. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management (CIKM’14). ACM, New York, NY, USA, 61--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Tetsuya Sakai. 2016. Topic set size design. Information Retrieval Journal 19, 3 (1 Jun 2016), 256--283. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ian Soboroff, Charles Nicholas, and Patrick Cahan. 2001. Ranking retrieval systems without relevance judgments. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’01). ACM, New York, NY, 66--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Karen Sparck Jones and Cornelis Joost van Rijsbergen. n.d. Information retrieval test collections. Journal of Documentation 32, 1.Google ScholarGoogle Scholar
  19. Anselm Spoerri. 2005. How the overlap between the search results of different retrieval systems correlates with document relevance. Proceedings of the American Society for Information Science and Technology 42, 1 (2005).Google ScholarGoogle ScholarCross RefCross Ref
  20. Ellen M. Voorhees. 2004. Overview of the TREC 2004 robust track. In Proceedings of The Thirteenth Text Retrieval Conference (TREC'04). Vol. 4. http://trec.nist.gov/pubs/trec13/papers/ROBUST.OVERVIEW.pdf.Google ScholarGoogle Scholar
  21. Ellen M. Voorhees and Chris Buckley. 2002. The effect of topic set size on retrieval experiment error. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’02). ACM, New York, NY, 316--323. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Ellen M. Voorhees and Donna Harman. 2000. Overview of the Proceedings of the 8th Text REtrieval Conference (TREC-8). 1--24.Google ScholarGoogle Scholar
  23. William Webber, Alistair Moffat, and Justin Zobel. 2008. Statistical power in retrieval experimentation. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM’08). ACM, New York, NY, 571--580. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Shengli Wu and Fabio Crestani. 2003. Methods for ranking information retrieval systems without relevance judgments. In Proceedings of the 2003 ACM Symposium on Applied Computing (SAC’03). ACM, New York, NY, 811--816. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Justin Zobel. 1998. How reliable are the results of large-scale information retrieval experiments? In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98). ACM, New York, NY, 307--314. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Reproduce and Improve: An Evolutionary Approach to Select a Few Good Topics for Information Retrieval Evaluation

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Journal of Data and Information Quality
        Journal of Data and Information Quality  Volume 10, Issue 3
        Special Issue on Reproducibility in IR: Evaluation Campaigns, Collections and Analyses
        September 2018
        94 pages
        ISSN:1936-1955
        EISSN:1936-1963
        DOI:10.1145/3282439
        Issue’s Table of Contents

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 29 September 2018
        • Accepted: 1 July 2018
        • Revised: 1 April 2018
        • Received: 1 October 2017
        Published in jdiq Volume 10, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format