Abstract
This paper briefly introduces the main ideas of a sustainable development OCR system based on open architecture techniques and then describes the construction of an optical character recognition (OCR) center built on computer clusters, for the purpose of dynamically improving the recognition precision of the digitized texts of a million volumes of books produced by the China-US Million Books Digital Library (CADAL) Project. The practice of this center will provide helpful reference for other digital library projects.
Similar content being viewed by others
References
Brunelli, M., Writer, N., 2004. The Holy Grail of Model-driven Development. http://searchwebservices.techtarget.com/qna/0,289202,sid26_gci999474,00.html.
Bruntland, G.(Ed.), 1987. Our Common Future: The World Commission on Environment and Development. Oxford University Press, Oxford.
Bu, F.Y., Liu, C.S., Ding, X.Q., 2004. Distinguish tables from graphics in layout analysis. Computer Engineering and Application, 12:83–87.
Chen, L., Ding, X.Q., 2004. Font recognition of single Chinese character based on wavelet feature. Acta Electronica Sinica, 32(2):177–180.
Chen, Y., Sun, Y.F., Zhang, Y.Z., 2004. A study on segmentation method for gray document image. Journal of Chinese Information Processing, 18(4):44–49.
DCR (Development and Reform Committee), 2004. The Approval for Report on Results of Feasibility Study on Construction Project of the Chinese Academy Digital Library & Information System (CADLIS)'s Tenth Five-Year Plan Authorized by Development and Reform Committee, China, No. 2004-1649 (in Chinese).
Evi, N., Yang, J.Z.H., 1999. UNIX System Administration Handbook. Tsinghua University Press, Beijing.
Kim, M.S., Ryu, S., Cho, K,T., Rhee, T.H., Choi, H.I., Kim, J.H., 2004. Recognition-based Digitalization of Korean Historical Archives. Asia Information Retrieval Symposium AIRS 2004. Revised Selected Papers (Lecture Notes in Computer Science, 3411:281–288).
Shaw, E.J., 2000. Building a digital library: a technology manager's point of view. The Journal of Academic Librarianship, 26(6):394–398.
Sparks, G., 2005. MDA Overview. Sparx Systems. http://www.sparxsystems.com/bin/MDA%20Tool.pdf.
Author information
Authors and Affiliations
Corresponding author
Additional information
Project supported by China-US Million Books Digital Library Project
Rights and permissions
About this article
Cite this article
Chen, H., Ji-hai, Z. & Xiao, H. A sustainable development OCR system in CADAL application. J. Zhejiang Univ. Sci. A 6, 1312–1317 (2005). https://doi.org/10.1631/jzus.2005.A1312
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/jzus.2005.A1312
Key words
- Sustainable Development
- Digital Library
- optical character recognition (OCR)
- China-US Million Books Digital Library (CADAL)