Skip to main content

Temporal Analysis of Crawling Activities of Commercial Web Robots

  • Conference paper
  • First Online:

Abstract

Web robots periodically crawl Web sites to download their content, thus producing potential bandwidth overload and performance degradation. To cope with their presence, it is then important to understand and predict their behavior. The analysis of the properties of the traffic generated by some commercial robots has shown that their access patterns vary: some tend to revisit the pages rather often and employ many cooperating clients, whereas others crawl the site very thoroughly and extensively following regular temporal patterns. Crawling activities are usually intermixed with inactivity periods whose duration is easily predicted.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Calzarossa, M., Massari, L.: Analysis of Web logs: challenges and findings. In: Hummel, K., Hlavacs, H., Gansterer, W. (eds.) Performance Evaluation of Computer and Communication Systems—Milestones and Future Challenges. Lecture Notes in Computer Science, vol. 6821, pp. 227–239. Springer, Heidelberg (2011)

    Google Scholar 

  2. Dikaiakos, M., Stassopoulou, A., Papageorgiou, L.: An investigation of Web crawler behavior: characterization and metrics. Comput. Commun. 28(8), 880–897 (2005)

    Article  Google Scholar 

  3. Doran, D., Gokhale, S.: Web robot detection techniques: overview and limitations. Data Min. Knowl. Disc. 22, 183–210 (2011)

    Article  Google Scholar 

  4. Kwon, S., Kim, Y., Cha, S.: Web robot detection based on pattern-matching technique. J. Inf. Sci. 38(2), 118–126 (2012)

    Article  Google Scholar 

  5. Lee, J., Cha, S., Lee, D., Lee, H.: Classification of web robots: an empirical study based on over one billion requests. Comput. Secur. 28(8), 795–802 (2009)

    Article  Google Scholar 

  6. Lourenco, A., Belo, O.: Catching Web crawlers in the act. In: Proceedings of the International Conference on Web Engineering, pp. 265–272 (2006)

    Google Scholar 

  7. Olston, C., Najork, M.: Web crawling. J. Found. Trends Inf. Retrieval 4(3), 175–246 (2010)

    Article  MATH  Google Scholar 

  8. SPEC Web Site—European mirror. http://spec.unipv.it

  9. Stassopoulou, A., Dikaiakos, M.: Web robot detection: a probabilistic reasoning approach. Comput. Netw. 53(3), 265–278 (2009)

    Article  MATH  Google Scholar 

  10. Tan, P., Kumar, V.: Discovery of Web robot sessions based on their navigational patterns. Data Min. Knowl. Disc. 6(1), 9–35 (2002)

    Article  MathSciNet  Google Scholar 

  11. Thelwall, M., Stuart, D.: Web crawling ethics revisited: cost, privacy, and denial of service. J. Am. Soc. Inf. Sci. Technol. 57(13), 1771–1779 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luisa Massari .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag London

About this paper

Cite this paper

Calzarossa, M.C., Massari, L. (2013). Temporal Analysis of Crawling Activities of Commercial Web Robots. In: Gelenbe, E., Lent, R. (eds) Computer and Information Sciences III. Springer, London. https://doi.org/10.1007/978-1-4471-4594-3_44

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-4594-3_44

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-4593-6

  • Online ISBN: 978-1-4471-4594-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics