Original articleAn efficient gray-level thresholding algorithm for historic document images
Section snippets
Research aims
In spite of the extended use of broad band Internet connections, it is very difficult to access a set of thousands of true color images of historical documents each one of them occupying almost 500 KBytes. As most part of these archives is handwritten, a conversion to text file is not an appropriate solution nowadays. So one possibility is the conversion of the images to black-and-white, thus decreasing the amount of data needed to be stored and transmitted. This conversion is not so easily done
Conclusions
This paper presents some advances in image thresholding processing of historical documents. The algorithm developed is applied to a set of documents from the end of the 19th century and beginning of the 20th century. The documents are part of the bequest of Joaquim Nabuco, one of the most important personalities in Brazilian history who had a major role in the campaign to free slaves in Brazil.
In order to make the documents more easily available, a conversion of the images from a color template
Acknowledgments
This research is partially sponsored by CNPq, UPE, Universidad Rey Juan Carlos and Agencia Española de Cooperación Internacional (AECI) contract no. A/2948/05.
References (28)
- et al.
Minimum error thresholding
Pattern Recognition
(1986) - et al.
Investigations on fuzzy thresholding based on fuzzy clustering
Pattern Recognition
(1997) - et al.
Image thresholding by minimizing the measures of fuzziness
Pattern Recognition
(1995 Jan) Entropic thresholding, a new approach
Computer Graphics and Image Processing
(1981)A new method for gray-level picture thresholding using the entropy of the histogram
Computer Vision, Graphics and Image Processing
(1985)Threshold selection using Renyi's entropy
Pattern Recognition
(1997)PROHIST PROJECT
Image segmentation of historical documents: using a quality index
Lecture Notes in Computer Science
(2004)- C.A.B. Mello, Synthesis of Images of Historical Documents for Web Visualization, in IEEE International Multimedia...
- et al.
Image thresholding of historical documents using entropy and ROC curves
Lecture Notes in Computer Science
(2005)
Algorithms for Image Processing and Computer Vision
Picture thresholding using an iterative selection method
IEEE Transactions on Systems, Man and Cybernetics
Cited by (22)
An analysis of the transition proportion for binarization in handwritten historical documents
2014, Pattern RecognitionAudiovisual production, restoration-archiving and content management methods to preserve local tradition and folkloric heritage
2014, Journal of Cultural HeritageCitation Excerpt :Further processing was on demand deployed in related editing-authoring environments during the corresponding production phases, along with common creative-aesthetic treatment (e.g. removal-concealment of paper-folds and scratches, color adjustment, contrast-scaling effects, etc.). In methodological terms, despite the choices that were made in the current project, there is a plurality of applicable noise recognition and AV content enhancement algorithms [21,23–25,29–32] that may be deployed. Thus, the methodology models of Figs. 1 and 2 outclass the particularities of the specific work, and can be deployed in more generic AV-CH scenarios.
Unsupervised measures for parameter selection of binarization algorithms
2011, Pattern RecognitionTransition thresholds and transition operators for binarization and edge detection
2010, Pattern RecognitionCitation Excerpt :They compare the gray-intensity mean of small neighborhoods around the pixel of interest. ( 4) and (5) Both Chen et al. [5] and Mello et al. [6] binarize documents using gray-intensity images as input. Whereas Chen et al. generate the binary image from the edge image of the gray-intensity image, Mello et al. compute a threshold based on a weighted entropy equation.
Transition pixel: A concept for binarization based on edge detection and gray-intensity histograms
2010, Pattern RecognitionCitation Excerpt :What constitute foreground depends on the objects to be recognized. While in document analysis one is interested in the location and extraction of ink with high contrast [4,5], in other contexts the information to be extracted can depend on the objects and their relationships. Fig. 1, for instance, shows (a) a triangle and (b) grid lines with similar gray intensity.
- 1
Tel.: +34 91 664 7452; fax: +34 91 488 8530.