ABSTRACT
The main goals of a web search engine are quality, efficiency, and scalability. In this tutorial, we focus on the last two goals, providing a fairly comprehensive overview of the scalability and efficiency challenges in large-scale web search engines. In particular, the tutorial provides an in-depth architectural overview of a web search engine, mainly focusing on the web crawling, indexing, and query processing components. The scalability and efficiency issues encountered in these components are presented at four different granularities: at the level of a single computer, a cluster of computers, a single data center, and a multi-center search engine. The tutorial also points at open research problems and provides recommendations to researchers who are new to the field.
- R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley Publishing Company, USA, 2nd edition, 2011.Google ScholarDigital Library
- B. B. Cambazoglu and R. Baeza-Yates. Scalability challenges in web search engines. In M. Melucci, R. Baeza-Yates, and W. B. Croft, editors, Advanced Topics in Information Retrieval, volume 33 of The Information Retrieval Series, pages 27--50. Springer Berlin Heidelberg, 2011.Google ScholarCross Ref
- B. B. Cambazoglu and R. Baeza-Yates. Scalability and efficiency challenges in commercial web search engines. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, page 1124, 2013. Google ScholarDigital Library
- C. Olston and M. Najork. Web crawling. Foundations and Trends in Information Retrieval, 4(3):175--246, 2010. Google ScholarDigital Library
- J. Zobel and A. Moffat. Inverted files for text search engines. ACM Computing Surveys, 38(2), 2006. Google ScholarDigital Library
Index Terms
- Scalability and efficiency challenges in large-scale web search engines
Recommendations
Scalability and Efficiency Challenges in Large-Scale Web Search Engines
SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information RetrievalCommercial web search engines need to process thousands of queries every second and provide responses to user queries within a few hundred milliseconds. As a consequence of these tight performance constraints, search engines construct and maintain very ...
Scalability and Efficiency Challenges in Large-Scale Web Search Engines
WSDM '15: Proceedings of the Eighth ACM International Conference on Web Search and Data MiningCommercial web search engines need to process thousands of queries every second and provide responses to user queries within a few hundred milliseconds. As a consequence of these tight performance constraints, search engines construct and maintain very ...
Scalability and efficiency challenges in commercial web search engines
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrievalCommercial web search engines rely on very large compute infrastructures to be able to cope with the continuous growth of the Web and user bases. Achieving scalability and efficiency in such large-scale search engines requires making careful ...
Comments