Abstract
Web robots periodically crawl Web sites to download their content, thus producing potential bandwidth overload and performance degradation. To cope with their presence, it is then important to understand and predict their behavior. The analysis of the properties of the traffic generated by some commercial robots has shown that their access patterns vary: some tend to revisit the pages rather often and employ many cooperating clients, whereas others crawl the site very thoroughly and extensively following regular temporal patterns. Crawling activities are usually intermixed with inactivity periods whose duration is easily predicted.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Calzarossa, M., Massari, L.: Analysis of Web logs: challenges and findings. In: Hummel, K., Hlavacs, H., Gansterer, W. (eds.) Performance Evaluation of Computer and Communication Systems—Milestones and Future Challenges. Lecture Notes in Computer Science, vol. 6821, pp. 227–239. Springer, Heidelberg (2011)
Dikaiakos, M., Stassopoulou, A., Papageorgiou, L.: An investigation of Web crawler behavior: characterization and metrics. Comput. Commun. 28(8), 880–897 (2005)
Doran, D., Gokhale, S.: Web robot detection techniques: overview and limitations. Data Min. Knowl. Disc. 22, 183–210 (2011)
Kwon, S., Kim, Y., Cha, S.: Web robot detection based on pattern-matching technique. J. Inf. Sci. 38(2), 118–126 (2012)
Lee, J., Cha, S., Lee, D., Lee, H.: Classification of web robots: an empirical study based on over one billion requests. Comput. Secur. 28(8), 795–802 (2009)
Lourenco, A., Belo, O.: Catching Web crawlers in the act. In: Proceedings of the International Conference on Web Engineering, pp. 265–272 (2006)
Olston, C., Najork, M.: Web crawling. J. Found. Trends Inf. Retrieval 4(3), 175–246 (2010)
SPEC Web Site—European mirror. http://spec.unipv.it
Stassopoulou, A., Dikaiakos, M.: Web robot detection: a probabilistic reasoning approach. Comput. Netw. 53(3), 265–278 (2009)
Tan, P., Kumar, V.: Discovery of Web robot sessions based on their navigational patterns. Data Min. Knowl. Disc. 6(1), 9–35 (2002)
Thelwall, M., Stuart, D.: Web crawling ethics revisited: cost, privacy, and denial of service. J. Am. Soc. Inf. Sci. Technol. 57(13), 1771–1779 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag London
About this paper
Cite this paper
Calzarossa, M.C., Massari, L. (2013). Temporal Analysis of Crawling Activities of Commercial Web Robots. In: Gelenbe, E., Lent, R. (eds) Computer and Information Sciences III. Springer, London. https://doi.org/10.1007/978-1-4471-4594-3_44
Download citation
DOI: https://doi.org/10.1007/978-1-4471-4594-3_44
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-4471-4593-6
Online ISBN: 978-1-4471-4594-3
eBook Packages: EngineeringEngineering (R0)