Abstract
In this paper, we investigate the usage of fine-grained font recognition on OCR for books printed from the 15th to the 18th century. We used a newly created dataset for OCR of early printed books for which fonts are labeled with bounding boxes. We know not only the font group used for each character, but the locations of font changes as well. In books of this period, we frequently find font group changes mid-line or even mid-word that indicate changes in language. We consider 8 different font groups present in our corpus and investigate 13 different subsets: the whole dataset and text lines with a single font, multiple fonts, Roman fonts, Gothic fonts, and each of the considered fonts, respectively. We show that OCR performance is strongly impacted by font style and that selecting fine-tuned models with font group recognition has a very positive impact on the results. Moreover, we developed a system using local font group recognition in order to combine the output of multiple font recognition models, and show that while slower, this approach performs better not only on text lines composed of multiple fonts but on the ones containing a single font only as well.
M. Seuret and J. van der Loop—Contributed equally to this research.
Supported by the Deutsche Forschungsgemeinschaft (DFG) - Project number 460605811.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
On an intermediate version of EMoFoG (Early Modern Font Groups), DOI 10.5281/zenodo.7880739, to appear.
- 3.
We found out that applying only OCR models for font groups which have, somewhere in the batch, a score higher than 0.1 improved the speed by roughly 40% with no more than differences of 0.01% of CER. However, these values are highly data-dependent.
- 4.
No class labels are shown to the classifier when the COCR system is trained so that the classifier has no restriction in how it merges the OCR outputs.
- 5.
- 6.
References
Bjerring-Hansen, J., Kristensen-McLachlan, R.D., Diderichsen, P., Hansen, D.H.: Mending Fractured Texts. A Heuristic Procedure for Correcting OCR Data. CEUR-WS (2022)
Bluche, T., Messina, R.: Gated convolutional recurrent neural networks for multilingual handwriting recognition. In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 01, pp. 646–651 (2017). https://doi.org/10.1109/ICDAR.2017.111
Breul, T.: ocrodeg: document image degradation - github.com (2020). https://github.com/NVlabs/ocrodeg. Accessed 10 Feb 2023
Chen, J., Mu, S., Xu, S., Ding, Y.: HENet: forcing a network to think more for font recognition. In: 3rd International Conference on Advanced Information Science and System (AISS), pp. 1–5 (2021)
Coquenet, D., Chatelain, C., Paquet, T.: SPAN: a simple predict & align network for handwritten paragraph recognition. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12823, pp. 70–84. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86334-0_5
Coquenet, D., Chatelain, C., Paquet, T.: End-to-end handwritten paragraph text recognition using a vertical attention network. IEEE Trans. Pattern Anal. Mach. Intell. 45, 508–524 (2022). https://doi.org/10.1109/TPAMI.2022.3144899
Coquenet, D., Chatelain, C., Paquet, T.: Recurrence-free unconstrained handwritten text recognition using gated fully convolutional network. In: 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 19–24 (2020). https://doi.org/10.1109/ICFHR2020.2020.00015
Diaz, D.H., Qin, S., Ingle, R., Fujii, Y., Bissacco, A.: Rethinking text line recognition models (2021). https://doi.org/10.48550/ARXIV.2104.07787. https://arxiv.org/abs/2104.07787
Fossey, R., Baird, H.: A 100 font classifier. In: 1st IAPR International Conference on Document Analysis and Recognition (ICDAR) (1991)
Fukushima, K.: Neural network model for a mechanism of pattern recognition unaffected by shift in position - neocognitron. IEICE Tech. Rep. A 62(10), 658–665 (1979)
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: 23rd International Conference on Machine Learning, pp. 369–376 (2006)
Grosicki, E., El-Abed, H.: ICDAR 2011 - French handwriting recognition competition. In: 11th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 1459–1463 (2011). https://doi.org/10.1109/ICDAR.2011.290
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Kang, L., Riba, P., Rusiñol, M., Fornés, A., Villegas, M.: Pay attention to what you read: non-recurrent handwritten text-line recognition. Pattern Recogn. 129, 108766 (2022). https://doi.org/10.1016/j.patcog.2022.108766. https://www.sciencedirect.com/science/article/pii/S0031320322002473
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Levenshtein, V.I., et al.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Dokl. 10, 707–710 (1966)
Li, M., et al.: TrOCR: transformer-based optical character recognition with pre-trained models (2021). https://doi.org/10.48550/ARXIV.2109.10282. https://arxiv.org/abs/2109.10282
Lyu, P., et al.: MaskOCR: text recognition with masked encoder-decoder pretraining (2022). https://doi.org/10.48550/ARXIV.2206.00311. https://arxiv.org/abs/2206.00311
Marti, U.V., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recogn. 5(1), 39–46 (2002). https://doi.org/10.1007/s100320200071
Michael, J., Labahn, R., Grüning, T., Zöllner, J.: Evaluating sequence-to-sequence models for handwritten text recognition. In: 15th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 1286–1293 (2019). https://doi.org/10.1109/ICDAR.2019.00208
Nicolaou, A., Slimane, F., Maergner, V., Liwicki, M.: Local binary patterns for Arabic optical font recognition. In: 11th IAPR International Workshop on Document Analysis Systems (DAS), pp. 76–80. IEEE (2014)
Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 971–987 (2002)
Puigcerver, J.: Are multidimensional recurrent layers really necessary for handwritten text recognition? In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 01, pp. 67–72 (2017). https://doi.org/10.1109/ICDAR.2017.20
Reul, C., Springmann, U., Wick, C., Puppe, F.: Improving OCR accuracy on early printed books by utilizing cross fold training and voting. In: 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 423–428 (2018). https://doi.org/10.1109/DAS.2018.30
Seuret, M., Limbach, S., Weichselbaumer, N., Maier, A., Christlein, V.: Dataset of pages from early printed books with multiple font groups. In: 15th International Workshop on Historical Document Imaging and Processing (HIP), pp. 1–6 (2019)
Springmann, U., Reul, C., Dipper, S., Baiter, J.: Ground truth for training OCR engines on historical documents in German Fraktur and Early Modern Latin. arXiv preprint arXiv:1809.05501 (2018)
Tensmeyer, C., Saunders, D., Martinez, T.: Convolutional neural networks for font classification. In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 985–990. IEEE (2017)
TorchVision maintainers and contributors: TorchVision: PyTorch’s Computer Vision library (2016). https://github.com/pytorch/vision
Vaswani, A., et al.: Attention is all you need. In: Guyon, I., (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017). https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
Wick, C., Reul, C.: One-model ensemble-learning for text recognition of historical printings. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12821, pp. 385–399. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86549-8_25
Wick, C., Zöllner, J., Grüning, T.: Transformer for handwritten text recognition using bidirectional post-decoding. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12823, pp. 112–126. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86334-0_8
Wick, C., Zöllner, J., Grüning, T.: Rescoring sequence-to-sequence models for text line recognition with CTC-prefixes. In: Uchida, S., Barney, E., Eglin, V. (eds.) 15th IAPR International Workshop on Document Analysis Systems (DAS), pp. 260–274. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-06555-2_18
Yang, J., Kim, H., Kwak, H., Kim, I.: HanFont: large-scale adaptive hangul font recognizer using CNN and font clustering. Int. J. Doc. Anal. Recogn. (IJDAR) 22, 407–416 (2019)
Yousef, M., Hussain, K.F., Mohammed, U.S.: Accurate, data-efficient, unconstrained text recognition with convolutional neural networks. Pattern Recogn. 108, 107482 (2020). https://doi.org/10.1016/j.patcog.2020.107482
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Seuret, M. et al. (2023). Combining OCR Models for Reading Early Modern Books. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14191. Springer, Cham. https://doi.org/10.1007/978-3-031-41734-4_21
Download citation
DOI: https://doi.org/10.1007/978-3-031-41734-4_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41733-7
Online ISBN: 978-3-031-41734-4
eBook Packages: Computer ScienceComputer Science (R0)