ABSTRACT
Creating digital representations of ancient manuscripts, prints and maps is a challenging task due to the sources' fragile and heterogeneous natures. Digitization requires a very specialized set of scanning hardware in order to cover the sources' diversity. The central task is obtaining the maximum reproduction quality while minimizing the error rate, which is difficult to achieve due to the large amounts of image data resulting from digitization, putting huge computational loads on image processing modules, error-detection and information retrieval heuristics. As digital copies initially do not contain any information about their sources' semantics, additional efforts have to be made to extract semantic metadata. This is an error-prone, time-consuming manual process, which calls for automated mechanisms to support the user. This paper introduces a decentralized, event-driven workflow system designed to overcome the above mentioned challenges. It leverages dynamic routing between workflow components, thus being able to quickly adapt to the sources' unique requirements. It provides a scalable approach to soften out high computational loads on single units by using distributed computing and provides modules for automated image pre-/post-processing, error-detection heuristics, data mining, semantic analysis, metadata augmentation, quality assurance and an export functionality to established publishing platforms or long-term storage facilites.
- J. Armstrong and T. Helen. Making reliable distributed systems in the presence of software errors, 2003.Google Scholar
- H. Schöneberg. Context vector classification - term classification with context evaluation. In A. L. N. Fred and J. Filipe, editors, KDIR, pages 387--391. SciTePress, 2010.Google Scholar
- H. Schöneberg and F. Müller. Contextual approaches for identification of toponyms in ancient documents. In A. L. N. Fred, J. Filipe, A. L. N. Fred, and J. Filipe, editors, KDIR, pages 163--168. SciTePress, 2012.Google Scholar
- E. D. Tiona. Efficency and Fault-Tolerance in Workflows for Early Documents (in German). Master's thesis, University of Würzburg, to appear 2013.Google Scholar
Index Terms
A scalable, distributed and dynamic workflow system for digitization processes
Recommendations
Innovative training solutions for digitization
JCDL '05: Proceedings of the 5th ACM/IEEE-CS joint conference on Digital librariesThe benefits of digitizing library collections are important and diverse. Patrons outside traditional geographic boundaries can be served along side local patrons. Access to local history, a vibrant part of many library holdings, and original, rare, and/...
Knowledge Discovery from the Digital Library’s Contents: Bangladesh Perspective
Towards Open and Trustworthy Digital SocietiesAbstractThe purpose of this study is to explore the present trends of knowledge discovery (KD) from digital library (DL) systems in Bangladesh. The main obstacles of KD from the contents of DL and ways to overcome the barriers are also described. This ...
Digitization initiatives and special libraries in India
Purpose - The purpose of the study is to evaluate the objectives, priorities and criteria of digitization. It also analyses the open access initiatives adopted by the special libraries in National Capital Region of India. The paper further examines the ...
Comments