skip to main content
10.1145/3197768.3201537acmotherconferencesArticle/Chapter ViewAbstractPublication PagespetraConference Proceedingsconference-collections
research-article

Error Analysis on Harvesting Data over the Internet

Authors Info & Claims
Published:26 June 2018Publication History

ABSTRACT

Harvesting tasks gather information to a central repository. We studied 880560 harvesting tasks from 3446 harvesting services in 354 harvesting rounds during a period of 15 months, of which 382705 failed and the remaining tasks occasionally returning fewer records. A significant part of the Open Archive Initiative harvesting services never worked or have ceased working while many other services fail occasionally. A harvesting task includes many stages of information exchange, and each one of them may fail - but with different consequences each time. We studied the reported warning messages, the number of records returned, and the required response time to discover relations among them. We found that about half of the harvesting tasks on each harvesting round fail, and the number of failing tasks is slowly increasing. We developed a method of analysis that can be used to reverse engineering such complex network systems and to categorize the reasons of failure into useful classes. Our results do not indicate a new approach to harvesting or conclude to a breakthrough advice, but make clear the complexity of the operation in an ever changing networking environment and alarm the reader that some facts that may be considered trivial, actually they are not! They help us to better understand the risks involved, and to design more reliable procedures and improved ways to closely monitor them.

References

  1. Bui, Y. & Park, J., "An assessment of metadata quality: a case study of the National Science Digital Library Metadata Repository," (2005) In Haidar Moukdad (Ed.) CAIS/ACSI 2006 Information Science Revisited: Approaches to Innovation. Proceedings of the 2005 annual conference of the Canadian Association for Information Science held with the Congress of the Social Sciences and Humanities of Canada at York University, Toronto, Ontario.Google ScholarGoogle Scholar
  2. Fuhr, N., Tsakonas, G., Aalberg, T., Agosti, M., Hansen, P., Kapidakis, S., Klas, P., Kovács, L, Landoni, M., Micsik, A., Papatheodorou, C., Peters C. and Sølvberg, I., "Evaluation of Digital Libraries", (2007) International Journal of Digital Library, Springer-Verlag, vol. 8, no 1, November 2007, pp. 21--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Hughes, B., "Metadata quality evaluation: experience from the open language archives community," (2005) Berlin Springer. Lecture Notes in Computer Science vol. 3334. ISBN 978-3-540-24030-3.Google ScholarGoogle Scholar
  4. Kapidakis, S., "Comparing Metadata Quality in the Europeana Context," (2012) Proceedings of the 5th ACM international conference on PErvasive Technologies Related to Assistive Environments (PETRA 2012), Heraklion, Greece, June 6-8 2012, ACM International Conference Proceeding Series; vol. 661. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Kapidakis, S., "Rating Quality in Metadata Harvesting," (2015) Proceedings of the 8th ACM international conference on PErvasive Technologies Related to Assistive Environments (PETRA 2015), Corfu, Greece, July 1-3 2015, ACM International Conference Proceeding Series; ISBN 978-1-4503-3452-5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Kapidakis, S., "Exploring Metadata Providers Reliability and Update Behavior" (2016) Proceedings of the International Conference on Theory and Practice of Digital Libraries (TPDL 2016), LNCS 9819, Springer, Hannover, Germany, September 5-9, 2016.Google ScholarGoogle Scholar
  7. Kapidakis, S., "Exploring the Consistent behavior of Information Services", CSCC 2016, Corfu, July 13-16, 2016.Google ScholarGoogle Scholar
  8. Kapidakis, S., "When a Metadata Provider Task is Successful" (2017) Proceedings of the International Conference on Theory and Practice of Digital Libraries (TPDL 2017), LNCS 10450, Springer, Thessaloniki, Greece, September 18-21, 2017, pp. 544--552Google ScholarGoogle Scholar
  9. Lagoze, C., Krafft, D., Cornwell, T., Dushay, N., Eckstrom, D. & Saylor, J., "Metadata aggregation and "automated digital libraries": a retrospective on the NSDL experience", (2006) Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries (JCDL 06), pp. 230--239 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Moreira, B.L., Goncalves, M.A., Laender, A.H.F. & Fox, E.A. "Automatic evaluation of digital libraries with 5SQual," (2009) Journal of Informetrics, vol. 3, 2, pp. 102--123.Google ScholarGoogle Scholar
  11. Ochoa, X. & Duval, E., "Automatic evaluation of metadata quality in digital repositories," (2009). International Journal on Digital Libraries, vol. 10(2/3), pp. 67--91.Google ScholarGoogle Scholar
  12. Yesikov, Dmitry & Ivutin, Alexey & Larkin, E.V. & Kotov, Vladislav. (2017). Multi-agent Approach for Distributed Information Systems Reliability Prediction. Procedia Computer Science. 103, pp 416--420. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ward., J. "A quantitative analysis of unqualified dublin core metadata element set usage within data providers registered with the open archives initiative", (2003) Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries (JCDL 03), ISBN:0-7695-1939-3, pp. 315--317 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Zhang, Y., "Developing a holistic model for digital library evaluation," (2010) Journal of the American Society for Information Science and Technology, vol. 61, 1, pp. 88--110. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Error Analysis on Harvesting Data over the Internet

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        PETRA '18: Proceedings of the 11th PErvasive Technologies Related to Assistive Environments Conference
        June 2018
        591 pages
        ISBN:9781450363907
        DOI:10.1145/3197768

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 26 June 2018

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader