skip to main content
10.1145/3628797.3628976acmotherconferencesArticle/Chapter ViewAbstractPublication PagessoictConference Proceedingsconference-collections
research-article
Open Access

Impact of the ground truth quality for handwriting recognition

Published:07 December 2023Publication History

ABSTRACT

Handwriting recognition is a key technology for accessing the content of old manuscripts, helping to preserve cultural heritage. Deep learning shows an impressive performance in solving this task. However, to achieve its full potential, it requires a large amount of labeled data, which is difficult to obtain for ancient languages and scripts. Often, a trade-off has to be made between ground truth quantity and quality, as is the case for the recently introduced Bullinger database. It contains an impressive amount of over a hundred thousand labeled text line images of mostly premodern German and Latin texts that were obtained by automatically aligning existing page-level transcriptions with text line images. However, the alignment process introduces systematic errors, such as wrongly hyphenated words. In this paper, we investigate the impact of such errors on training and evaluation and suggest means to detect and correct typical alignment errors.

References

  1. [n. d.]. Bullinger Digital. https://www.bullinger-digital.ch/ Accessed on 18.09.2023.Google ScholarGoogle Scholar
  2. Cathaoir Agnew, Ciarán Eising, Patrick Denny, Anthony Scanlan, Pepijn Van De Ven, and Eoin M. Grua. 2023. Quantifying the Effects of Ground Truth Annotation Quality on Object Detection and Instance Segmentation Performance. IEEE Access 11 (2023), 25174–25188. https://doi.org/10.1109/ACCESS.2023.3256723Google ScholarGoogle ScholarCross RefCross Ref
  3. Michele Alberti, Lars Vögtlin, Vinaychandran Pondenkandath, Mathias Seuret, Rolf Ingold, and Marcus Liwicki. 2019. Labeling, cutting, grouping: an efficient text line segmentation method for medieval manuscripts. In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 1200–1206.Google ScholarGoogle ScholarCross RefCross Ref
  4. José Andrés, Alejandro H Toselli, and Enrique Vidal. 2023. Search for Hyphenated Words in Probabilistic Indices: A Machine Learning Approach. In International Conference on Document Analysis and Recognition. Springer, 269–285.Google ScholarGoogle Scholar
  5. Hangbo Bao, Li Dong, Songhao Piao, and Furu Wei. 2021. Beit: Bert pre-training of image transformers. ICLR 2021 (2021).Google ScholarGoogle Scholar
  6. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 4171–4186. https://doi.org/10.18653/v1/N19-1423Google ScholarGoogle ScholarCross RefCross Ref
  7. Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. ICLR 2021 (2021).Google ScholarGoogle Scholar
  8. G Fink and Thomas Plotz. 2007. On the use of context-dependent modeling units for HMM-based offline handwriting recognition. In Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), Vol. 2. IEEE, 729–733.Google ScholarGoogle ScholarCross RefCross Ref
  9. Andreas Fischer, Marcus Liwicki, and Rolf Jurg Ingold. 2020. Handwritten historical document analysis, recognition, and retrieval-state of the art and future trends. (2020).Google ScholarGoogle Scholar
  10. Alex Graves, Santiago Fernández, Faustino Gomez, and Jürgen Schmidhuber. 2006. Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks. In Proceedings of the 23rd International Conference on Machine Learning(ICML ’06). Association for Computing Machinery, 369–376. https://doi.org/10.1145/1143844.1143891Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput. 9, 8 (nov 1997), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Philip Kahle, Sebastian Colutto, Günter Hackl, and Günter Mühlberger. 2017. Transkribus - A Service Platform for Transcription, Recognition and Retrieval of Historical Documents. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Vol. 04. 19–24. https://doi.org/10.1109/ICDAR.2017.307Google ScholarGoogle ScholarCross RefCross Ref
  13. Boda Li, Gabriele Baris, Pak Hung Chan, Anima Rahman, and Valentina Donzella. 2022. Testing ground-truth errors in an automotive dataset for a DNN-based object detector. In 2022 International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME). 1–6. https://doi.org/10.1109/ICECCME55909.2022.9988623Google ScholarGoogle ScholarCross RefCross Ref
  14. Minghao Li, Tengchao Lv, Jingye Chen, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, and Furu Wei. 2023. Trocr: Transformer-based optical character recognition with pre-trained models. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 13094–13102.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. ArXiv abs/1907.11692 (2019).Google ScholarGoogle Scholar
  16. Ilya Loshchilov and Frank Hutter. 2017. Decoupled Weight Decay Regularization., pages. arXiv:1711.05101 arXiv:1711.05101Google ScholarGoogle Scholar
  17. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32. Curran Associates, Inc., 8024–8035. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdfGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. T. Polyak and A. B. Juditsky. 1992. Acceleration of Stochastic Approximation by Averaging. SIAM J. Control Optim. 30, 4 (jul 1992), 838–855.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Joan Puigcerver. 2017. Are Multidimensional Recurrent Layers Really Necessary for Handwritten Text Recognition?. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Vol. 01. 67–72.Google ScholarGoogle ScholarCross RefCross Ref
  20. Najoua Rahal, Lars Vögtlin, and Rolf Ingold. 2023. Historical document image analysis using controlled data for pre-training. International Journal on Document Analysis and Recognition (IJDAR) (2023), 1–14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Anna Scius-Bertrand, Phillip Ströbel, Martin Volk, Tobias Hodel, and Andreas Fischer. 2023. The Bullinger Dataset: A Writer Adaptation Challenge. In International Conference on Document Analysis and Recognition. Springer, 397–410.Google ScholarGoogle Scholar
  22. Anna Scius-Bertrand, Lars Voegtlin, Michele Alberti, Andreas Fischer, and Marc Bui. 2019. Layout analysis and text column segmentation for historical Vietnamese steles. In Proceedings of the 5th International Workshop on Historical Document Imaging and Processing. 84–89.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. ICLR 2015 (2015).Google ScholarGoogle Scholar
  24. Martin Spoto, Beat Wolf, Andreas Fischer, and Anna Scius-Bertrand. 2022. Improving Handwriting Recognition for Historical Documents Using Synthetic Text Lines. In International Graphonomics Conference. Springer, 61–75.Google ScholarGoogle Scholar
  25. Vlad Taran, Yuri Gordienko, Alexandr Rokovyi, Oleg Alienin, and Sergii Stirenko. 2020. Impact of ground truth annotation quality on performance of semantic image segmentation of traffic conditions. In Advances in Computer Science for Engineering and Education II. Springer, 183–193.Google ScholarGoogle Scholar
  26. Maxim Tkachenko, Mikhail Malyuk, Andrey Holmanyuk, and Nikolai Liubimov. 2020-2022. Label Studio: Data labeling software. https://github.com/heartexlabs/label-studio Open source software available from https://github.com/heartexlabs/label-studio.Google ScholarGoogle Scholar
  27. Alejandro H. Toselli and Enrique Vidal. 2021. The Finnish Court Records Dataset. (May 2021). https://doi.org/10.5281/zenodo.4767732Google ScholarGoogle ScholarCross RefCross Ref
  28. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).Google ScholarGoogle Scholar
  29. Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. 2020. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, 38–45. https://www.aclweb.org/anthology/2020.emnlp-demos.6Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Impact of the ground truth quality for handwriting recognition

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      SOICT '23: Proceedings of the 12th International Symposium on Information and Communication Technology
      December 2023
      1058 pages
      ISBN:9798400708916
      DOI:10.1145/3628797

      Copyright © 2023 Owner/Author

      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 7 December 2023

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate147of318submissions,46%
    • Article Metrics

      • Downloads (Last 12 months)91
      • Downloads (Last 6 weeks)14

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format