Abstract
We present a recognition-based digitization method for building digital library of large amount of historical archives. Because the most of archives are manually transcribed in ancient Chinese characters, their digitization present unique academic and pragmatic challenges. By integrating the layout analysis and the recognition into single probabilistic framework, our system achieved 95.1% character recognition rates on test data set, despite the obsolete characters and unique variants used in the archives. Compared with intuitive verification and correction interface, the system freed the operators from repetitive typing tasks and improved the overall throughput significantly.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Hara, S.: OCR for CJK classical texts preliminary examination. In: Proc. Pacific Neighborhood Consortium (PNC) Annual Meeting, Taipei, Taiwan, pp. 11–17 (2000)
Lixin, Z., Ruwei, D.: Off-line handwritten Chinese characterrecognition with nonlinear pre-classification. In: Tan, T., Shi, Y., Gao, W. (eds.) ICMI 2000. LNCS, vol. 1948, pp. 473–479. Springer, Heidelberg (2000)
Tung, C.H., Lee, H.J., Tsai, J.Y.: Multi-stage precandidate selection in handwritten Chinese character recognition system. Pattern Recognition 27(8), 1093–1102 (1994)
Tong, L.C., Tan, S.L.: Speeding up Chinese character recognition in an automatic document reading system. Pattern Recognition 31(11), 1601–1612 (1998)
Mizukami, Y.: A handwritten Chinese character recognition system using hierachical displacement extraction based on directional features. Pattern Recognition Letters 19(7), 595–604 (1998)
Tseng, Y.H., Lee, H.J.: Recognition-based handwritten Chinese character segmentation using a probabilistic Viterbi algorithm. Pattern Recognition Letters 20, 791–806 (1999)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience, Hoboken (2001)
Lee, S.W., Park, J.S.: Nonlinear shape normalization methods for the recognition of large-set handwritten characters. Pattern Recognition 27(7), 895–902 (1994)
Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. Prentice-Hall, Englewood Cliffs (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kim, M.S., Ryu, S., Cho, K.T., Rhee, T.H., Choi, H.I., Kim, J.H. (2005). Recognition-Based Digitalization of Korean Historical Archives. In: Myaeng, S.H., Zhou, M., Wong, KF., Zhang, HJ. (eds) Information Retrieval Technology. AIRS 2004. Lecture Notes in Computer Science, vol 3411. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31871-2_24
Download citation
DOI: https://doi.org/10.1007/978-3-540-31871-2_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25065-4
Online ISBN: 978-3-540-31871-2
eBook Packages: Computer ScienceComputer Science (R0)