Abstract
This paper discusses the design and implementation of SDC, a new caching strategy aimed to efficiently exploit the locality present in the stream of queries submitted to a Web Search Engine. SDC stores the results of the most frequently submitted queries in a fixed-sizeread-only portion of the cache, while the queries that cannot be satisfied by the static portion compete for the remaining entries of the cache according to a given cache replacement policy. We experimentally demonstrated the superiority of SDC over purely static and dynamic policies by measuring the hit-ratio achieved on two large query logs by varying cache parameters and the replacement policy used. Finally, we propose an implementation optimized for concurrent accesses, and we accurately evaluate its scalability.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30(1-7), 107–117 (1998)
Jansen, B.J., Spink, A., Saracevic, T.: Real life, real users, and real needs: a study and analysis of user queries on the web. Inf. Proc. and Manag. 36(2), 207–227 (2000)
Johnson, T., Shasha, D.: 2q: A low overhead high performance buffer management replacement algorithm. In: Proc. 1994 VLDB, pp. 439–450 (1994)
Lempel, R., Moran, S.: Predictive caching and prefetching of query results in search engines. In: Proc. of the twelfth international conference on World Wide Web, pp. 19–28. ACM Press, New York (2003)
Markatos, E.P.: On caching search engine results. In: Proc. of the 5th Int. Web Caching and Content Delivery Workshop (2000)
O’Neil, E.J., O’Neil, P.E., Weikum, G.: The lru-k page replacement algorithm for database disk buffer. In: Proc. of the 1993 ACM SIGMOD International Conference On Management Of Data, pp. 297–306 (1993)
Orlando, S., Perego, R., Silvestri, F.: Design of a parallel and distributed web search engine. In: Proc. of ParCo 2001 int’l conf. (2001)
Robinson, J.T., Devarakonda, M.V.: Data cache management using frequency-based replacement. In: Proc. of the 1990 ACM SIGMETRICS Conference, pp. 134–142 (1990)
Saraiva, P.C., Silva de Moura, E., Ziviani, N., Meira, W., Fonseca, R., Ribeiro-Neto, B.: Rank-preserving two-level caching for scalable search engine. In: SIGIR 2001 (2001)
Silverstein, C., Henzinger, M., Marais, H., Moricz, M.: Analysis of a very large web search engine query log. In: ACM SIGIR Forum, pp. 6–12 (1999)
Spink, A., Jansen, B.J., Wolfram, D., Saracevic, T.: Searching the web: the public and their queries. J. Am. Soc. Inf. Sc. & Tech. 53(2), 226–234 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fagni, T., Perego, R., Silvestri, F. (2004). A Highly Scalable Parallel Caching System for Web Search Engine Results. In: Danelutto, M., Vanneschi, M., Laforenza, D. (eds) Euro-Par 2004 Parallel Processing. Euro-Par 2004. Lecture Notes in Computer Science, vol 3149. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27866-5_45
Download citation
DOI: https://doi.org/10.1007/978-3-540-27866-5_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22924-7
Online ISBN: 978-3-540-27866-5
eBook Packages: Springer Book Archive