Abstract
The analysis of historical document images is not only interesting for the preservation of historical heritage but also for the extraction of semantic knowledge. In this paper we present a word spotting approach to find keyword images in digital archives. Detected words allow to construct metadata on document contents for indexing and retrieval purposes. Instead of using OCR based approches that would require accurate segmentation and high image quality, we propose a shape recognition method based on the well-known shape context descriptor. Our method is proven to be robust under hightly distorted and noisy document images, a usual drawback in old document analysis. It has been used in a real application scenario, the Collection of Border Records of the Girona Archive. In particular, spotted keywords are used to extract knowledge on personal data of people referred in the documents.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This work has been partially supported by the Spanish project TIN2006-15694-C02-02 and the Subdirecció General d’Arxius de la Generalitat de Catalunya.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Meyyappan, N., Chowdhury, G., Foo, S.: A review of the status of twenty digital libraries. Journal of Information Science 26(5), 337–355 (2000)
Baird, H.S., Govindaraju, V., Lopresti, D.P.: Document analysis systems for digital libraries: Challenges and opportunities. In: Marinai, S., Dengel, A. (eds.) DAS 2004. LNCS, vol. 3163, pp. 1–16. Springer, Heidelberg (2004)
Antonacopoulos, A., Karatzas, D.: A complete approach to the conversion of typewritten historical documents for digital archives. In: Marinai, S., Dengel, A. (eds.) DAS 2004. LNCS, vol. 3163, pp. 90–101. Springer, Heidelberg (2004)
He, J., Downton, A.: Evaluation of a user assisted archive construction system for online natural history archives. In: Proceedings of 8th Int. Conf. on Document Analysis and Recognition, Seoul, Korea, pp. 42–446 (2005)
Le Bourgeois, F., Kaileh, H.: Automatic metadata retrieval from ancient manuscripts. In: Marinai, S., Dengel, A. (eds.) DAS 2004. LNCS, vol. 3163, pp. 75–89. Springer, Heidelberg (2004)
Couasnon, B., Camillerapp, J., Leplumey, I.: Making handwritten archives documents accessible to public with a generic system of document image analysis. In: Proceedings of First International Workshop on Document Image Analysis for Libraries (DIAL04), Palo Alto, California, pp. 270–277 (2004)
Journet, N., Eglin, V., Ramel, J., Mullot, R.: Text/graphic labelling of ancient printed documents. In: Proceedings of 8th Int. Conf. on Document Analysis and Recognition. Seoul, Korea, pp. 1010–1014 (2005)
Surapong, U., Hammound, M., Garrido, C., Franco, P., Ogier, J.: Ancient graphic documents characterization. In: Proceedings of Sixth IAPR Workshop on Graphics Recognition. Hong Kong, China, pp. 97–105 (2005)
Tomai, C., Zhang, B., Govindaraju, V.: Transcript mapping for historic handwritten document images. In: Proc. of 8th International Workshop on Frontiers in Handwriting Recognition. Ontario, Canada, pp. 413–418 (2002)
Rath, T., Manmatha, R.: Word image matching using dynamic time warping. In: Proc. of the Conf. on Computer Vision and Pattern Recognition (CVPR), Madison, WI , vol. 2, pp. 521–527 (2003)
Loncaric, S.: A survey of shape analysis techniques. Pattern Recognition 31(8), 983–1001 (1998)
Zhang, D., Lu, G.: Review of shape representation and description techniques. Pattern Recognition 37(1), 1–19 (2004)
Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(24), 509–522 (2002)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Lladós, J., Pratim-Roy, P., Rodríguez, J.A., Sánchez, G. (2007). Word Spotting in Archive Documents Using Shape Contexts. In: Martí, J., Benedí, J.M., Mendonça, A.M., Serrat, J. (eds) Pattern Recognition and Image Analysis. IbPRIA 2007. Lecture Notes in Computer Science, vol 4478. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72849-8_37
Download citation
DOI: https://doi.org/10.1007/978-3-540-72849-8_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72848-1
Online ISBN: 978-3-540-72849-8
eBook Packages: Computer ScienceComputer Science (R0)