skip to main content
10.1145/2567948.2577271acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
tutorial

Scalability and efficiency challenges in large-scale web search engines

Published:07 April 2014Publication History

ABSTRACT

The main goals of a web search engine are quality, efficiency, and scalability. In this tutorial, we focus on the last two goals, providing a fairly comprehensive overview of the scalability and efficiency challenges in large-scale web search engines. In particular, the tutorial provides an in-depth architectural overview of a web search engine, mainly focusing on the web crawling, indexing, and query processing components. The scalability and efficiency issues encountered in these components are presented at four different granularities: at the level of a single computer, a cluster of computers, a single data center, and a multi-center search engine. The tutorial also points at open research problems and provides recommendations to researchers who are new to the field.

References

  1. R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley Publishing Company, USA, 2nd edition, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. B. B. Cambazoglu and R. Baeza-Yates. Scalability challenges in web search engines. In M. Melucci, R. Baeza-Yates, and W. B. Croft, editors, Advanced Topics in Information Retrieval, volume 33 of The Information Retrieval Series, pages 27--50. Springer Berlin Heidelberg, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  3. B. B. Cambazoglu and R. Baeza-Yates. Scalability and efficiency challenges in commercial web search engines. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, page 1124, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. Olston and M. Najork. Web crawling. Foundations and Trends in Information Retrieval, 4(3):175--246, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Zobel and A. Moffat. Inverted files for text search engines. ACM Computing Surveys, 38(2), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Scalability and efficiency challenges in large-scale web search engines

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      WWW '14 Companion: Proceedings of the 23rd International Conference on World Wide Web
      April 2014
      1396 pages
      ISBN:9781450327459
      DOI:10.1145/2567948

      Copyright © 2014 Copyright is held by the owner/author(s)

      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 7 April 2014

      Check for updates

      Qualifiers

      • tutorial

      Acceptance Rates

      Overall Acceptance Rate1,899of8,196submissions,23%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader