Combining OCR Models for Reading Early Modern Books

Seuret, Mathias; van der Loop, Janne; Weichselbaumer, Nikolaus; Mayr, Martin; Molnar, Janina; Hass, Tatjana; Christlein, Vincent

doi:10.1007/978-3-031-41734-4_21

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14191))

Included in the following conference series:

International Conference on Document Analysis and Recognition

1213 Accesses
3 Altmetric

Abstract

In this paper, we investigate the usage of fine-grained font recognition on OCR for books printed from the 15th to the 18th century. We used a newly created dataset for OCR of early printed books for which fonts are labeled with bounding boxes. We know not only the font group used for each character, but the locations of font changes as well. In books of this period, we frequently find font group changes mid-line or even mid-word that indicate changes in language. We consider 8 different font groups present in our corpus and investigate 13 different subsets: the whole dataset and text lines with a single font, multiple fonts, Roman fonts, Gothic fonts, and each of the considered fonts, respectively. We show that OCR performance is strongly impacted by font style and that selecting fine-tuned models with font group recognition has a very positive impact on the results. Moreover, we developed a system using local font group recognition in order to combine the output of multiple font recognition models, and show that while slower, this approach performs better not only on text lines composed of multiple fonts but on the ones containing a single font only as well.

M. Seuret and J. van der Loop—Contributed equally to this research.

Supported by the Deutsche Forschungsgemeinschaft (DFG) - Project number 460605811.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

ICDAR 2024 Competition on Multi Font Group Recognition and OCR

Multi-font Telugu Text Recognition Using Hidden Markov Models and Akshara Bi-grams

From Detection to Modelling: An End-to-End Paleographic System for Analysing Historical Handwriting Styles

Notes

1.
https://github.com/seuretm/combined-ocr.
2.
On an intermediate version of EMoFoG (Early Modern Font Groups), DOI 10.5281/zenodo.7880739, to appear.
3.
We found out that applying only OCR models for font groups which have, somewhere in the batch, a score higher than 0.1 improved the speed by roughly 40% with no more than differences of 0.01% of CER. However, these values are highly data-dependent.
4.
No class labels are shown to the classifier when the COCR system is trained so that the classifier has no restriction in how it merges the OCR outputs.
5.
https://pypi.org/project/editdistance/.
6.
https://github.com/tesseract-ocr/tessdata_best.

References

Bjerring-Hansen, J., Kristensen-McLachlan, R.D., Diderichsen, P., Hansen, D.H.: Mending Fractured Texts. A Heuristic Procedure for Correcting OCR Data. CEUR-WS (2022)
Google Scholar
Bluche, T., Messina, R.: Gated convolutional recurrent neural networks for multilingual handwriting recognition. In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 01, pp. 646–651 (2017). https://doi.org/10.1109/ICDAR.2017.111
Breul, T.: ocrodeg: document image degradation - github.com (2020). https://github.com/NVlabs/ocrodeg. Accessed 10 Feb 2023
Chen, J., Mu, S., Xu, S., Ding, Y.: HENet: forcing a network to think more for font recognition. In: 3rd International Conference on Advanced Information Science and System (AISS), pp. 1–5 (2021)
Google Scholar
Coquenet, D., Chatelain, C., Paquet, T.: SPAN: a simple predict & align network for handwritten paragraph recognition. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12823, pp. 70–84. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86334-0_5
Chapter Google Scholar
Coquenet, D., Chatelain, C., Paquet, T.: End-to-end handwritten paragraph text recognition using a vertical attention network. IEEE Trans. Pattern Anal. Mach. Intell. 45, 508–524 (2022). https://doi.org/10.1109/TPAMI.2022.3144899
Article Google Scholar
Coquenet, D., Chatelain, C., Paquet, T.: Recurrence-free unconstrained handwritten text recognition using gated fully convolutional network. In: 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 19–24 (2020). https://doi.org/10.1109/ICFHR2020.2020.00015
Diaz, D.H., Qin, S., Ingle, R., Fujii, Y., Bissacco, A.: Rethinking text line recognition models (2021). https://doi.org/10.48550/ARXIV.2104.07787. https://arxiv.org/abs/2104.07787
Fossey, R., Baird, H.: A 100 font classifier. In: 1st IAPR International Conference on Document Analysis and Recognition (ICDAR) (1991)
Google Scholar
Fukushima, K.: Neural network model for a mechanism of pattern recognition unaffected by shift in position - neocognitron. IEICE Tech. Rep. A 62(10), 658–665 (1979)
Google Scholar
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: 23rd International Conference on Machine Learning, pp. 369–376 (2006)
Google Scholar
Grosicki, E., El-Abed, H.: ICDAR 2011 - French handwriting recognition competition. In: 11th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 1459–1463 (2011). https://doi.org/10.1109/ICDAR.2011.290
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Google Scholar
Kang, L., Riba, P., Rusiñol, M., Fornés, A., Villegas, M.: Pay attention to what you read: non-recurrent handwritten text-line recognition. Pattern Recogn. 129, 108766 (2022). https://doi.org/10.1016/j.patcog.2022.108766. https://www.sciencedirect.com/science/article/pii/S0031320322002473
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Levenshtein, V.I., et al.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Dokl. 10, 707–710 (1966)
MathSciNet Google Scholar
Li, M., et al.: TrOCR: transformer-based optical character recognition with pre-trained models (2021). https://doi.org/10.48550/ARXIV.2109.10282. https://arxiv.org/abs/2109.10282
Lyu, P., et al.: MaskOCR: text recognition with masked encoder-decoder pretraining (2022). https://doi.org/10.48550/ARXIV.2206.00311. https://arxiv.org/abs/2206.00311
Marti, U.V., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recogn. 5(1), 39–46 (2002). https://doi.org/10.1007/s100320200071
Article MATH Google Scholar
Michael, J., Labahn, R., Grüning, T., Zöllner, J.: Evaluating sequence-to-sequence models for handwritten text recognition. In: 15th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 1286–1293 (2019). https://doi.org/10.1109/ICDAR.2019.00208
Nicolaou, A., Slimane, F., Maergner, V., Liwicki, M.: Local binary patterns for Arabic optical font recognition. In: 11th IAPR International Workshop on Document Analysis Systems (DAS), pp. 76–80. IEEE (2014)
Google Scholar
Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 971–987 (2002)
Article MATH Google Scholar
Puigcerver, J.: Are multidimensional recurrent layers really necessary for handwritten text recognition? In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 01, pp. 67–72 (2017). https://doi.org/10.1109/ICDAR.2017.20
Reul, C., Springmann, U., Wick, C., Puppe, F.: Improving OCR accuracy on early printed books by utilizing cross fold training and voting. In: 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 423–428 (2018). https://doi.org/10.1109/DAS.2018.30
Seuret, M., Limbach, S., Weichselbaumer, N., Maier, A., Christlein, V.: Dataset of pages from early printed books with multiple font groups. In: 15th International Workshop on Historical Document Imaging and Processing (HIP), pp. 1–6 (2019)
Google Scholar
Springmann, U., Reul, C., Dipper, S., Baiter, J.: Ground truth for training OCR engines on historical documents in German Fraktur and Early Modern Latin. arXiv preprint arXiv:1809.05501 (2018)
Tensmeyer, C., Saunders, D., Martinez, T.: Convolutional neural networks for font classification. In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 985–990. IEEE (2017)
Google Scholar
TorchVision maintainers and contributors: TorchVision: PyTorch’s Computer Vision library (2016). https://github.com/pytorch/vision
Vaswani, A., et al.: Attention is all you need. In: Guyon, I., (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017). https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
Wick, C., Reul, C.: One-model ensemble-learning for text recognition of historical printings. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12821, pp. 385–399. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86549-8_25
Chapter Google Scholar
Wick, C., Zöllner, J., Grüning, T.: Transformer for handwritten text recognition using bidirectional post-decoding. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12823, pp. 112–126. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86334-0_8
Chapter Google Scholar
Wick, C., Zöllner, J., Grüning, T.: Rescoring sequence-to-sequence models for text line recognition with CTC-prefixes. In: Uchida, S., Barney, E., Eglin, V. (eds.) 15th IAPR International Workshop on Document Analysis Systems (DAS), pp. 260–274. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-06555-2_18
Chapter Google Scholar
Yang, J., Kim, H., Kwak, H., Kim, I.: HanFont: large-scale adaptive hangul font recognizer using CNN and font clustering. Int. J. Doc. Anal. Recogn. (IJDAR) 22, 407–416 (2019)
Article Google Scholar
Yousef, M., Hussain, K.F., Mohammed, U.S.: Accurate, data-efficient, unconstrained text recognition with convolutional neural networks. Pattern Recogn. 108, 107482 (2020). https://doi.org/10.1016/j.patcog.2020.107482
Article Google Scholar

Download references

Author information

Authors and Affiliations

Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
Mathias Seuret, Martin Mayr & Vincent Christlein
University of Mainz, Mainz, Germany
Janne van der Loop, Nikolaus Weichselbaumer, Janina Molnar & Tatjana Hass

Authors

Mathias Seuret
View author publications
You can also search for this author in PubMed Google Scholar
Janne van der Loop
View author publications
You can also search for this author in PubMed Google Scholar
Nikolaus Weichselbaumer
View author publications
You can also search for this author in PubMed Google Scholar
Martin Mayr
View author publications
You can also search for this author in PubMed Google Scholar
Janina Molnar
View author publications
You can also search for this author in PubMed Google Scholar
Tatjana Hass
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Christlein
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mathias Seuret .

Editor information

Editors and Affiliations

TU Dortmund University, Dortmund, Germany
Gernot A. Fink
Adobe, College Park, MN, USA
Rajiv Jain
Osaka Metropolitan University, Osaka, Japan
Koichi Kise
Rochester Institute of Technology, Rochester, NY, USA
Richard Zanibbi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Seuret, M. et al. (2023). Combining OCR Models for Reading Early Modern Books. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14191. Springer, Cham. https://doi.org/10.1007/978-3-031-41734-4_21

Download citation

DOI: https://doi.org/10.1007/978-3-031-41734-4_21
Published: 19 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41733-7
Online ISBN: 978-3-031-41734-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Combining OCR Models for Reading Early Modern Books