Elsevier

Pattern Recognition

Volume 86, February 2019, Pages 37-47
Pattern Recognition

A selectional auto-encoder approach for document image binarization

https://doi.org/10.1016/j.patcog.2018.08.011Get rights and content

Highlights

  • A selectional autoencoder approach for document image binarization is studied.

  • The neural network is devoted to learning an image-to-image binarization.

  • Comprehensive experimentation with datasets of different typology is presented.

  • Results demonstrate that the approach is able to outperform the state of the art.

Abstract

Binarization plays a key role in the automatic information retrieval from document images. This process is usually performed in the first stages of document analysis systems, and serves as a basis for subsequent steps. Hence it has to be robust in order to allow the full analysis workflow to be successful. Several methods for document image binarization have been proposed so far, most of which are based on hand-crafted image processing strategies. Recently, Convolutional Neural Networks have shown an amazing performance in many disparate duties related to computer vision. In this paper we discuss the use of convolutional auto-encoders devoted to learning an end-to-end map from an input image to its selectional output, in which activations indicate the likelihood of pixels to be either foreground or background. Once trained, documents can therefore be binarized by parsing them through the model and applying a global threshold. This approach has proven to outperform existing binarization strategies in a number of document types.

Introduction

Image binarization consists in assigning a binary value to every single pixel of an image. Within the context of document analysis systems, the main objective is to distinguish the foreground (meaningful information) from the background.

Binarization plays a key role in the workflow of many document analysis and recognition systems [1], [2], [3], [4], [5]. It not only helps to reduce the complexity of the task but is also advisable for procedures involving morphological operations, detection of connected components, or histogram analysis, among others. Many methods have been proposed to accomplish this task. However, it is often complex to attain good results because documents may contain several difficulties—such as irregular leveling, blots, bleed-through, and so on—that may cause the process to fail.

In addition to all these obstacles, it is convenient to emphasize that it is very difficult for the same method to work successfully in a number of document styles, since the set of potential domains is very heterogeneous. In order to deal with this situation, we discuss a framework with which to binarize image documents based on machine learning. That is, a ground-truth of examples is used to train a model to perform the binarization task. This allows using the same approach in a wide range of documents, provided there is specific ground-truth data to train a new model for each document type.

Specifically, we make use of Convolutional Neural Networks (CNN) [6]. These networks involve multi-layer architectures that perform a series of transformations (convolutions) to the input signal. The parameters of these transformations are adjusted through a training process. CNN have dramatically improved the state of the art in many tasks such as image, video, and speech processing [7]. Thus, its use for document binarization is promising. In this case, we consider an image-to-image convolutional architecture, which is trained to transform an input image into its binarized version.

Our experiments focus on testing this strategy in different document types, namely Latin text documents, palm leaf scripts, Persian documents, and music scores. We also compare the approach against other classical and state-of-the-art algorithms for binarization, showing that this approach leads to a significant improvement.

The remainder of the paper is structured as follows: related works to document image binarization are introduced in Section 2; the image-to-image binarization framework based on convolutional models is described in Section 3; the experiments are presented in Section 4, including model configuration, training strategies, comparison with existing techniques, and cross-document adaptation; and finally, the current work concludes in Section 5.

Section snippets

Background

The most straightforward procedure for image binarization is to resort to simple thresholding, in which all pixels under a certain grayscale value are set to 0, and those above to 1. This threshold can be fixed by hand, yet algorithms such as Otsu’s [8] automatically estimate a value according to the input image. However, as the complexity of the document to process increases, this simple procedure usually leads to poor or irregular binarization, and so it is preferable to resort to other kind

Selectional auto-encoder for document image binarization

From a machine learning point of view, image binarization can be formulated as a two-class classification task at pixel level. The framework proposed follows this idea and, therefore, consists in learning to estimate which label must be given to every single pixel of an image. Since we are dealing with images of documents, we define the set of labels as foreground and background. As mentioned above, a way to implement this approach is to use a neural network to estimate which of these labels

Experiments

This section details the experimentation carried out to evaluate the discussed approach. The performance of a binarization algorithm can be evaluated in several ways. For instance, if the algorithm is part of a workflow to perform a particular task, an interesting way to measure the performance is in relation to the final performance. However, this implies that the evaluation of the algorithm may not be totally fair, since it would be strongly related to the performance of the rest of the

Conclusions

In this paper an approach for document image binarization has been presented. The strategy is to train a Selectional Auto-Encoder (SAE) that is able to learn an end-to-end transformation to binarize an image. Given a piece of image of a fixed size, the model outputs a selectional value for each pixel of the image depending on the confidence whether the pixel belongs to the foreground of the document. These values are eventually thresholded to yield a discrete binary result.

The influence of the

Acknowledgments

This work was partially supported by the Social Sciences and Humanities Research Council of Canada, the Spanish Ministerio de Ciencia, Innovación y Universidades through Juan de la Cierva - Formación grant (Ref. FJCI-2016-27873), and the Universidad de Alicante through grant GRE-16-04.

Jorge Calvo-Zaragoza received his Ph.D. degree in computer science from the University of Alicante in Juny 2016. He joined McGill University (Canada) in 2017 as a postdoctoral fellow. Currently, he is the recipient of a Juan de la Cierva - Formación research grant from the Spanish government. He has authored more than 30 papers in peer-reviewed journals and international conferences. His main interests are focused on Pattern Recognition and Document Analysis.

References (43)

  • N. Otsu

    A threshold selection method from gray-level histograms

    Automatica

    (1975)
  • W. Niblack

    An Introduction to Digital Image Processing

    (1985)
  • C. Wolf et al.

    Text localization, enhancement and binarization in multimedia documents

    Proceedings of the International Conference on Pattern Recognition

    (2002)
  • G. Lazzara et al.

    Efficient multiscale sauvolaâs binarization

    Int. J. Doc. Anal. Recognit.

    (2014)
  • SuB. et al.

    Robust document image binarization technique for degraded document images

    IEEE Trans. Image Process.

    (2013)
  • N.R. Howe

    A laplacian energy for document binarization

    Proceedings of the 2011 International Conference on Document Analysis and Recognition (ICDAR)

    (2011)
  • N.R. Howe

    Document binarization with automatic parameter tuning

    Int. J. Doc. Anal. Recognit.

    (2013)
  • S. Katz et al.

    Direct visibility of point sets

    ACM Transactions on Graphics (TOG)

    (2007)
  • I. Pratikakis et al.

    ICFHR 2016 handwritten document image binarization contest (H-DIBCO 2016)

    Proceedings of the 15th International Conference on Frontiers in Handwriting Recognition, ICFHR, Shenzhen, China

    (2016)
  • ChiZ. et al.

    A two-stage binarization approach for document images

    Proceedings of the International Symposium on Intelligent Multimedia, Video and Speech Processing

    (2001)
  • J.L. Hidalgo et al.

    Enhancement and cleaning of handwritten data by using neural networks

    Proceedings of the 2nd Iberian Conference on Pattern Recognition and Image Analysis

    (2005)
  • Cited by (135)

    View all citing articles on Scopus

    Jorge Calvo-Zaragoza received his Ph.D. degree in computer science from the University of Alicante in Juny 2016. He joined McGill University (Canada) in 2017 as a postdoctoral fellow. Currently, he is the recipient of a Juan de la Cierva - Formación research grant from the Spanish government. He has authored more than 30 papers in peer-reviewed journals and international conferences. His main interests are focused on Pattern Recognition and Document Analysis.

    Antonio-Javier Gallego is assistant professor at the Department of Software and Computing Systems of the University of Alicante. He received a B.S. and M.S. degree in Computer Science from the University of Alicante in 2004, and the Ph.D. in Computer Science and Artificial Intelligence from the same university in 2012. He has more than 30 publications, including international journals, international conferences, books and book chapters. His research interests include Deep Learning, Pattern Recognition, and Computer Vision.

    View full text