ABSTRACT
Handwriting recognition is a key technology for accessing the content of old manuscripts, helping to preserve cultural heritage. Deep learning shows an impressive performance in solving this task. However, to achieve its full potential, it requires a large amount of labeled data, which is difficult to obtain for ancient languages and scripts. Often, a trade-off has to be made between ground truth quantity and quality, as is the case for the recently introduced Bullinger database. It contains an impressive amount of over a hundred thousand labeled text line images of mostly premodern German and Latin texts that were obtained by automatically aligning existing page-level transcriptions with text line images. However, the alignment process introduces systematic errors, such as wrongly hyphenated words. In this paper, we investigate the impact of such errors on training and evaluation and suggest means to detect and correct typical alignment errors.
- [n. d.]. Bullinger Digital. https://www.bullinger-digital.ch/ Accessed on 18.09.2023.Google Scholar
- Cathaoir Agnew, Ciarán Eising, Patrick Denny, Anthony Scanlan, Pepijn Van De Ven, and Eoin M. Grua. 2023. Quantifying the Effects of Ground Truth Annotation Quality on Object Detection and Instance Segmentation Performance. IEEE Access 11 (2023), 25174–25188. https://doi.org/10.1109/ACCESS.2023.3256723Google ScholarCross Ref
- Michele Alberti, Lars Vögtlin, Vinaychandran Pondenkandath, Mathias Seuret, Rolf Ingold, and Marcus Liwicki. 2019. Labeling, cutting, grouping: an efficient text line segmentation method for medieval manuscripts. In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 1200–1206.Google ScholarCross Ref
- José Andrés, Alejandro H Toselli, and Enrique Vidal. 2023. Search for Hyphenated Words in Probabilistic Indices: A Machine Learning Approach. In International Conference on Document Analysis and Recognition. Springer, 269–285.Google Scholar
- Hangbo Bao, Li Dong, Songhao Piao, and Furu Wei. 2021. Beit: Bert pre-training of image transformers. ICLR 2021 (2021).Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 4171–4186. https://doi.org/10.18653/v1/N19-1423Google ScholarCross Ref
- Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. ICLR 2021 (2021).Google Scholar
- G Fink and Thomas Plotz. 2007. On the use of context-dependent modeling units for HMM-based offline handwriting recognition. In Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), Vol. 2. IEEE, 729–733.Google ScholarCross Ref
- Andreas Fischer, Marcus Liwicki, and Rolf Jurg Ingold. 2020. Handwritten historical document analysis, recognition, and retrieval-state of the art and future trends. (2020).Google Scholar
- Alex Graves, Santiago Fernández, Faustino Gomez, and Jürgen Schmidhuber. 2006. Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks. In Proceedings of the 23rd International Conference on Machine Learning(ICML ’06). Association for Computing Machinery, 369–376. https://doi.org/10.1145/1143844.1143891Google ScholarDigital Library
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput. 9, 8 (nov 1997), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735Google ScholarDigital Library
- Philip Kahle, Sebastian Colutto, Günter Hackl, and Günter Mühlberger. 2017. Transkribus - A Service Platform for Transcription, Recognition and Retrieval of Historical Documents. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Vol. 04. 19–24. https://doi.org/10.1109/ICDAR.2017.307Google ScholarCross Ref
- Boda Li, Gabriele Baris, Pak Hung Chan, Anima Rahman, and Valentina Donzella. 2022. Testing ground-truth errors in an automotive dataset for a DNN-based object detector. In 2022 International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME). 1–6. https://doi.org/10.1109/ICECCME55909.2022.9988623Google ScholarCross Ref
- Minghao Li, Tengchao Lv, Jingye Chen, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, and Furu Wei. 2023. Trocr: Transformer-based optical character recognition with pre-trained models. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 13094–13102.Google ScholarDigital Library
- Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. ArXiv abs/1907.11692 (2019).Google Scholar
- Ilya Loshchilov and Frank Hutter. 2017. Decoupled Weight Decay Regularization., pages. arXiv:1711.05101 arXiv:1711.05101Google Scholar
- Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32. Curran Associates, Inc., 8024–8035. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdfGoogle ScholarDigital Library
- B. T. Polyak and A. B. Juditsky. 1992. Acceleration of Stochastic Approximation by Averaging. SIAM J. Control Optim. 30, 4 (jul 1992), 838–855.Google ScholarDigital Library
- Joan Puigcerver. 2017. Are Multidimensional Recurrent Layers Really Necessary for Handwritten Text Recognition?. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Vol. 01. 67–72.Google ScholarCross Ref
- Najoua Rahal, Lars Vögtlin, and Rolf Ingold. 2023. Historical document image analysis using controlled data for pre-training. International Journal on Document Analysis and Recognition (IJDAR) (2023), 1–14.Google ScholarDigital Library
- Anna Scius-Bertrand, Phillip Ströbel, Martin Volk, Tobias Hodel, and Andreas Fischer. 2023. The Bullinger Dataset: A Writer Adaptation Challenge. In International Conference on Document Analysis and Recognition. Springer, 397–410.Google Scholar
- Anna Scius-Bertrand, Lars Voegtlin, Michele Alberti, Andreas Fischer, and Marc Bui. 2019. Layout analysis and text column segmentation for historical Vietnamese steles. In Proceedings of the 5th International Workshop on Historical Document Imaging and Processing. 84–89.Google ScholarDigital Library
- Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. ICLR 2015 (2015).Google Scholar
- Martin Spoto, Beat Wolf, Andreas Fischer, and Anna Scius-Bertrand. 2022. Improving Handwriting Recognition for Historical Documents Using Synthetic Text Lines. In International Graphonomics Conference. Springer, 61–75.Google Scholar
- Vlad Taran, Yuri Gordienko, Alexandr Rokovyi, Oleg Alienin, and Sergii Stirenko. 2020. Impact of ground truth annotation quality on performance of semantic image segmentation of traffic conditions. In Advances in Computer Science for Engineering and Education II. Springer, 183–193.Google Scholar
- Maxim Tkachenko, Mikhail Malyuk, Andrey Holmanyuk, and Nikolai Liubimov. 2020-2022. Label Studio: Data labeling software. https://github.com/heartexlabs/label-studio Open source software available from https://github.com/heartexlabs/label-studio.Google Scholar
- Alejandro H. Toselli and Enrique Vidal. 2021. The Finnish Court Records Dataset. (May 2021). https://doi.org/10.5281/zenodo.4767732Google ScholarCross Ref
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).Google Scholar
- Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. 2020. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, 38–45. https://www.aclweb.org/anthology/2020.emnlp-demos.6Google ScholarCross Ref
Index Terms
- Impact of the ground truth quality for handwriting recognition
Recommendations
Attempts to recognize anomalously deformed Kana in Japanese historical documents
HIP '17: Proceedings of the 4th International Workshop on Historical Document Imaging and ProcessingThis paper presents methods for three different tasks of recognizing anomalously deformed Kana in Japanese historical documents, which were contested by IEICE PRMU1 2017. The tasks have three levels: single character recognition, three Kana characters ...
Indic script family and its offline handwriting recognition for characters/digits and words: a comprehensive survey
AbstractHandwriting recognition has become an active area of research in pattern recognition and machine learning in recent years. Handwriting recognition systems have a variety of applications ranging from digital character conversion to signboard ...
Numeral characters and capital letters segmentation recognition in mixed handwriting context
ICDAR '95: Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 2) - Volume 2For the analytic on-line recognition of handwriting, the range of pattern recognition problems can be described by the severity of letter segmentation required. More difficult problems require an interaction of letter segmentation and recognition. These ...
Comments