Skip to main content
Log in

A sustainable development OCR system in CADAL application

  • Digital Library Service and Management
  • Published:
Journal of Zhejiang University-SCIENCE A Aims and scope Submit manuscript

Abstract

This paper briefly introduces the main ideas of a sustainable development OCR system based on open architecture techniques and then describes the construction of an optical character recognition (OCR) center built on computer clusters, for the purpose of dynamically improving the recognition precision of the digitized texts of a million volumes of books produced by the China-US Million Books Digital Library (CADAL) Project. The practice of this center will provide helpful reference for other digital library projects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Brunelli, M., Writer, N., 2004. The Holy Grail of Model-driven Development. http://searchwebservices.techtarget.com/qna/0,289202,sid26_gci999474,00.html.

    Google Scholar 

  • Bruntland, G.(Ed.), 1987. Our Common Future: The World Commission on Environment and Development. Oxford University Press, Oxford.

    Google Scholar 

  • Bu, F.Y., Liu, C.S., Ding, X.Q., 2004. Distinguish tables from graphics in layout analysis. Computer Engineering and Application, 12:83–87.

    Google Scholar 

  • Chen, L., Ding, X.Q., 2004. Font recognition of single Chinese character based on wavelet feature. Acta Electronica Sinica, 32(2):177–180.

    Google Scholar 

  • Chen, Y., Sun, Y.F., Zhang, Y.Z., 2004. A study on segmentation method for gray document image. Journal of Chinese Information Processing, 18(4):44–49.

    Google Scholar 

  • DCR (Development and Reform Committee), 2004. The Approval for Report on Results of Feasibility Study on Construction Project of the Chinese Academy Digital Library & Information System (CADLIS)'s Tenth Five-Year Plan Authorized by Development and Reform Committee, China, No. 2004-1649 (in Chinese).

    Google Scholar 

  • Evi, N., Yang, J.Z.H., 1999. UNIX System Administration Handbook. Tsinghua University Press, Beijing.

    Google Scholar 

  • Kim, M.S., Ryu, S., Cho, K,T., Rhee, T.H., Choi, H.I., Kim, J.H., 2004. Recognition-based Digitalization of Korean Historical Archives. Asia Information Retrieval Symposium AIRS 2004. Revised Selected Papers (Lecture Notes in Computer Science, 3411:281–288).

    Article  Google Scholar 

  • Shaw, E.J., 2000. Building a digital library: a technology manager's point of view. The Journal of Academic Librarianship, 26(6):394–398.

    Article  Google Scholar 

  • Sparks, G., 2005. MDA Overview. Sparx Systems. http://www.sparxsystems.com/bin/MDA%20Tool.pdf.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huang Chen.

Additional information

Project supported by China-US Million Books Digital Library Project

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, H., Ji-hai, Z. & Xiao, H. A sustainable development OCR system in CADAL application. J. Zhejiang Univ. Sci. A 6, 1312–1317 (2005). https://doi.org/10.1631/jzus.2005.A1312

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/jzus.2005.A1312

Key words

Document code

CLC number

Navigation