ABSTRACT
Harvesting tasks gather information to a central repository. We studied 880560 harvesting tasks from 3446 harvesting services in 354 harvesting rounds during a period of 15 months, of which 382705 failed and the remaining tasks occasionally returning fewer records. A significant part of the Open Archive Initiative harvesting services never worked or have ceased working while many other services fail occasionally. A harvesting task includes many stages of information exchange, and each one of them may fail - but with different consequences each time. We studied the reported warning messages, the number of records returned, and the required response time to discover relations among them. We found that about half of the harvesting tasks on each harvesting round fail, and the number of failing tasks is slowly increasing. We developed a method of analysis that can be used to reverse engineering such complex network systems and to categorize the reasons of failure into useful classes. Our results do not indicate a new approach to harvesting or conclude to a breakthrough advice, but make clear the complexity of the operation in an ever changing networking environment and alarm the reader that some facts that may be considered trivial, actually they are not! They help us to better understand the risks involved, and to design more reliable procedures and improved ways to closely monitor them.
- Bui, Y. & Park, J., "An assessment of metadata quality: a case study of the National Science Digital Library Metadata Repository," (2005) In Haidar Moukdad (Ed.) CAIS/ACSI 2006 Information Science Revisited: Approaches to Innovation. Proceedings of the 2005 annual conference of the Canadian Association for Information Science held with the Congress of the Social Sciences and Humanities of Canada at York University, Toronto, Ontario.Google Scholar
- Fuhr, N., Tsakonas, G., Aalberg, T., Agosti, M., Hansen, P., Kapidakis, S., Klas, P., Kovács, L, Landoni, M., Micsik, A., Papatheodorou, C., Peters C. and Sølvberg, I., "Evaluation of Digital Libraries", (2007) International Journal of Digital Library, Springer-Verlag, vol. 8, no 1, November 2007, pp. 21--38. Google ScholarDigital Library
- Hughes, B., "Metadata quality evaluation: experience from the open language archives community," (2005) Berlin Springer. Lecture Notes in Computer Science vol. 3334. ISBN 978-3-540-24030-3.Google Scholar
- Kapidakis, S., "Comparing Metadata Quality in the Europeana Context," (2012) Proceedings of the 5th ACM international conference on PErvasive Technologies Related to Assistive Environments (PETRA 2012), Heraklion, Greece, June 6-8 2012, ACM International Conference Proceeding Series; vol. 661. Google ScholarDigital Library
- Kapidakis, S., "Rating Quality in Metadata Harvesting," (2015) Proceedings of the 8th ACM international conference on PErvasive Technologies Related to Assistive Environments (PETRA 2015), Corfu, Greece, July 1-3 2015, ACM International Conference Proceeding Series; ISBN 978-1-4503-3452-5. Google ScholarDigital Library
- Kapidakis, S., "Exploring Metadata Providers Reliability and Update Behavior" (2016) Proceedings of the International Conference on Theory and Practice of Digital Libraries (TPDL 2016), LNCS 9819, Springer, Hannover, Germany, September 5-9, 2016.Google Scholar
- Kapidakis, S., "Exploring the Consistent behavior of Information Services", CSCC 2016, Corfu, July 13-16, 2016.Google Scholar
- Kapidakis, S., "When a Metadata Provider Task is Successful" (2017) Proceedings of the International Conference on Theory and Practice of Digital Libraries (TPDL 2017), LNCS 10450, Springer, Thessaloniki, Greece, September 18-21, 2017, pp. 544--552Google Scholar
- Lagoze, C., Krafft, D., Cornwell, T., Dushay, N., Eckstrom, D. & Saylor, J., "Metadata aggregation and "automated digital libraries": a retrospective on the NSDL experience", (2006) Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries (JCDL 06), pp. 230--239 Google ScholarDigital Library
- Moreira, B.L., Goncalves, M.A., Laender, A.H.F. & Fox, E.A. "Automatic evaluation of digital libraries with 5SQual," (2009) Journal of Informetrics, vol. 3, 2, pp. 102--123.Google Scholar
- Ochoa, X. & Duval, E., "Automatic evaluation of metadata quality in digital repositories," (2009). International Journal on Digital Libraries, vol. 10(2/3), pp. 67--91.Google Scholar
- Yesikov, Dmitry & Ivutin, Alexey & Larkin, E.V. & Kotov, Vladislav. (2017). Multi-agent Approach for Distributed Information Systems Reliability Prediction. Procedia Computer Science. 103, pp 416--420. Google ScholarDigital Library
- Ward., J. "A quantitative analysis of unqualified dublin core metadata element set usage within data providers registered with the open archives initiative", (2003) Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries (JCDL 03), ISBN:0-7695-1939-3, pp. 315--317 Google ScholarDigital Library
- Zhang, Y., "Developing a holistic model for digital library evaluation," (2010) Journal of the American Society for Information Science and Technology, vol. 61, 1, pp. 88--110. Google ScholarDigital Library
Index Terms
- Error Analysis on Harvesting Data over the Internet
Recommendations
CapOS: Capacitor Error Resilience for Energy Harvesting Systems
Energy harvesting systems have emerged as an alternative to battery-operated Internet of Things (IoT) devices. To deal with frequent power outages in the absence of battery, energy harvesting systems rely on a capacitor-backed checkpoint mechanism also ...
Rating quality in metadata harvesting
PETRA '15: Proceedings of the 8th ACM International Conference on PErvasive Technologies Related to Assistive EnvironmentsThe quality of the data and metadata affects the interoperability of the collections and the quality of all processing. Our metadata quality metric helps the metadata harvester collection administrators detecting and improving the weaknesses of their ...
Smart Energy Harvesting Routing Protocol for WSN based E-Health Systems
MobileHealth '15: Proceedings of the 2015 Workshop on Pervasive Wireless HealthcareThis paper proposes a novel routing protocol called Smart Energy Harvesting Routing Protocol (SEHR) for data transmission in Wireless Sensor Network based e-Health systems (WSNEH). WSNEH is a sophisticated network environment where multiple types of ...
Comments