skip to main content
10.1145/1835449.1835542acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Do user preferences and evaluation measures line up?

Published:19 July 2010Publication History

ABSTRACT

This paper presents results comparing user preference for search engine rankings with measures of effectiveness computed from a test collection. It establishes that preferences and evaluation measures correlate: systems measured as better on a test collection are preferred by users. This correlation is established for both "conventional web retrieval" and for retrieval that emphasizes diverse results. The nDCG measure is found to correlate best with user preferences compared to a selection of other well known measures. Unlike previous studies in this area, this examination involved a large population of users, gathered through crowd sourcing, exposed to a wide range of retrieval systems, test collections and search tasks. Reasons for user preferences were also gathered and analyzed. The work revealed a number of new results, but also showed that there is much scope for future work refining effectiveness measures to better capture user preferences.

References

  1. Agrawal, R., Gollapudi, S., Halverson, A., & Ieong, S. 2009. Diversifying search results. ACM WSDM, 5--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Al-Maskari, A., Sanderson, M. & Clough, P., 2007. The relationship between IR effectiveness measures and user satisfaction. ACM SIGIR, 773--774. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Al-Maskari, A., Sanderson, M., Clough, P., & Airio, E. 2008. The good and the bad system: does the test collection predict users' effectiveness? ACM SIGIR, 59--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Allan, J., Carterette, B., & Lewis, J. 2005. When will information retrieval be "good enough"? ACM SIGIR, 433--440. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Alonso, O., Rose, D. E., & Stewart, B. 2008. Crowdsourcing for relevance evaluation. SIGIR Forum 42, 2, 9--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Alonso, O. & Mizzaro, S., 2009. Can we get rid of TREC assessors? Using Mechanical Turk for relevance assessment. In Proceedings of the SIGIR 2009 Workshop on the Future of IR Evaluation. 15--16.Google ScholarGoogle Scholar
  7. Arni, T., Tang, J., Sanderson, M., Clough, P. 2008. Creating a test collection to evaluate diversity in image retrieval. Workshop on Beyond Binary Relevance, SIGIR 2008.Google ScholarGoogle Scholar
  8. Barry, C. L. 1994. User-defined relevance criteria: an exploratory study. J. Am. Soc. Inf. Sci. 45, 3, 149--159. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Broder, A. 2002. A taxonomy of web search. SIGIR Forum 36(2) 3--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Chapelle, O. and Zhang, Y., 2009. A dynamic bayesian network click model for web search ranking. Proc. 18th WWW Conf, 1--10 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Clarke, C., Kolla, M., Cormack, G., Vechtomova, O., Ashkan, A., Büttcher, S., MacKinnon, I. 2008. Novelty & diversity in information retrieval evaluation. ACM SIGIR, 659--666. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Clarke, C., Kolla, M., & Vechtomova, O. 2009. An Effectiveness Measure for Ambiguous and Underspecified Queries. Advances in Information Retrieval Theory, 188--199. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Clarke, C., Craswell, N., and Soboroff, I. 2009. Preliminary Report on the TREC 2009 Web Track. TREC 2009 Notebook.Google ScholarGoogle Scholar
  14. Hersh, W.R. et al., 2002. Factors associated with success in searching MEDLINE and applying evidence to answer clinical questions, Am Med Inform Assoc.Google ScholarGoogle Scholar
  15. Hersh, W., Turpin, A., Price, S., Chan, B., Kramer, D., Sacherek, L., & Olson, D. 2000. Do batch and user evaluations give the same results? ACM SIGIR, 17--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Huuskonen, S. & Vakkari, P. 2008. Students' search process and outcome in Medline in writing an essay for a class on evidence-based medicine. Journal of Documentation, 64(2), 287--303.Google ScholarGoogle ScholarCross RefCross Ref
  17. Joachims, T., 2002. Evaluating retrieval performance using click through data. Workshop on Mathematical/Formal Methods in IR, 12--15.Google ScholarGoogle Scholar
  18. Radlinski, F., Kurup, M., Joachims, T. 2008. How does click through data reflect retrieval quality? ACM CIKM, 43--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Robertson, S. 2006. On GMAP: and other transformations, ACM CIKM, 78--83. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Smith, C.L. & Kantor, P.B., 2008. User adaptation: good results from poor systems. ACM SIGIR, 147--154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Su, L.T., 1992. Evaluation measures for interactive information retrieval. IP&M, 28(4), 503--516. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Tagliacozzo, R., 1977. Estimating the satisfaction of information users. Bulletin of the Medical Library Association, 65(2), 243--249.Google ScholarGoogle Scholar
  23. Thomas, D.R. 2006. A General Inductive Approach for Analyzing Qualitative Evaluation Data. American Journal of Evaluation, 27(2), 237--246.Google ScholarGoogle ScholarCross RefCross Ref
  24. Thomas, P. & Hawking, D., 2006. Evaluation by comparing result sets in context. ACM CIKM, 94--101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Turpin, A.H., Hersh, W. 2001. Why batch and user evaluations do not give the same results. ACM SIGIR, 225--231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Turpin, A. & Scholer, F. 2006. User performance versus precision measures for simple search tasks. ACM SIGIR, 11--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Yilmaz, E., Aslam, J., Robertson, S. 2008. A new rank correlation coefficient for information retrieval. ACM SIGIR, 587--594. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Zhai, C.X., Cohen, W.W. & Lafferty, J., 2003. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. ACM SIGIR, 10--17. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Do user preferences and evaluation measures line up?

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
      July 2010
      944 pages
      ISBN:9781450301534
      DOI:10.1145/1835449

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 19 July 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      SIGIR '10 Paper Acceptance Rate87of520submissions,17%Overall Acceptance Rate792of3,983submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader