Consideration of the Word’s Neighborhood in GATs for Information Extraction in Semi-structured Documents

Belhadj, Djedjiga; Belaïd, Yolande; Belaïd, Abdel

doi:10.1007/978-3-030-86331-9_55

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12822))

Included in the following conference series:

International Conference on Document Analysis and Recognition

3429 Accesses
3 Citations

Abstract

Most administrative documents take a semi-structured form (invoices, payslips, etc.). Extracting information from this type of document is still challenging because of the variability of its structure brought about by the change of layout style of the different administrations. In this work, we try to face this type of variation by using a multi-layer Graph Attention Network (GAT). We propose a general structure of a semi-structured document. Based on this latter, we adopt a star sub-graph to exploit the surrounding context of words, allowing neighboring words to help locate the searched words and rank them. The GAT makes it possible to exploit this type of neighborhood and to highlight important neighboring words likely to be better identified. Each graph node contains at the same time textual and visual features. We experiment the multi-layer GAT on three different datasets: invoices and payslips (generated artificially), and receipts (issued from SROIE ICDAR competition). For the later dataset, we get an important F1 score of 0.892.

Supported by BPI DeepTech.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Brown, J.: System and method for identification and extraction of data, US Patent 9,589,183, 7 March 2017
Google Scholar
Dengel, A.R., Klein, B.: smartFIX: a requirements-driven system for document analysis and understanding. In: Lopresti, D., Hu, J., Kashi, R. (eds.) DAS 2002. LNCS, vol. 2423, pp. 433–444. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45869-7_47
Chapter MATH Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Gal, R., Morag, N., Shilkrot, R.: Visual-linguistic methods for receipt field recognition. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11362, pp. 542–557. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20890-5_35
Chapter Google Scholar
Grattarola, D., Alippi, C.: Graph neural networks in tensorflow and keras with spektral. arXiv preprint arXiv:2006.12138 (2020)
Hammami, M., Héroux, P., Adam, S., d’Andecy, V.P.: One-shot field spotting on colored forms using subgraph isomorphism. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 586–590. IEEE (2015)
Google Scholar
Heinzerling, B., Strube, M.: Bpemb: tokenization-free pre-trained subword embeddings in 275 languages. arXiv preprint arXiv:1710.02187 (2017)
Hua, Y., Huang, Z., Guo, J., Qiu, W.: Attention-based graph neural network with global context awareness for document understanding. In: Sun, M., Li, S., Zhang, Y., Liu, Y., He, S., Rao, G. (eds.) CCL 2020. LNCS (LNAI), vol. 12522, pp. 45–56. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-63031-7_4
Chapter Google Scholar
Huang, Z., et al.: ICDAR 2019 competition on scanned receipt OCR and information extraction. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1516–1520. IEEE (2019)
Google Scholar
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015)
Kavas, I.: Analytic systems, methods, and computer-readable media for structured, semi-structured, and unstructured documents, US Patent 9,384,264, 5 July 2016
Google Scholar
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360 (2016)
Le, A.D., Pham, D.V., Nguyen, T.A.: Deep learning approach for receipt recognition. In: Dang, T.K., Küng, J., Takizawa, M., Bui, S.H. (eds.) FDSE 2019. LNCS, vol. 11814, pp. 705–712. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-35653-8_50
Chapter Google Scholar
Liu, X., Gao, F., Zhang, Q., Zhao, H.: Graph convolution for multimodal information extraction from visually rich documents. arXiv preprint arXiv:1903.11279 (2019)
Lohani, D., Belaïd, A., Belaïd, Y.: An invoice reading system using a graph convolutional network. In: Carneiro, G., You, S. (eds.) ACCV 2018. LNCS, vol. 11367, pp. 144–158. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21074-8_12
Chapter Google Scholar
Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. arXiv preprint arXiv:1603.01354 (2016)
Majumder, B.P., Potti, N., Tata, S., Wendt, J.B., Zhao, Q., Najork, M.: Representation learning for information extraction from form-like documents. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 6495–6504 (2020)
Google Scholar
Rusinol, M., Benkhelfallah, T., Poulain dAndecy, V.: Field extraction from administrative documents by incremental structural templates. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1100–1104. IEEE (2013)
Google Scholar
Schuster, D., et al.: Intellix-end-user trained information extraction for document archiving. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 101–105. IEEE (2013)
Google Scholar
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909 (2015)
Shen, Z., Tijerino, Y.: Ontology-based automatic receipt accounting system. In: 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, vol. 3, pp. 236–239. IEEE (2012)
Google Scholar
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., Bengio, Y.: Graph attention networks. In: International Conference on Learning Representations (2018)
Google Scholar
Yu, W., Lu, N., Qi, X., Gong, P., Xiao, R.: Pick: processing key information extraction from documents using improved graph learning-convolutional networks. arXiv preprint arXiv:2004.07464 (2020)
Zhang, K., Li, J.Z., Hong, M.C., Yan, X.D., Song, Q.: A semantics enabled intelligent semi-structured document processor. In: Yuan, Y., Wu, X., Lu, Y. (eds.) ISCTCS 2013. CCIS, vol. 426, pp. 328–344. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-43908-1_41
Chapter Google Scholar
Zhang, P., et al.: TRIE: end-to-end text reading and information extraction for document understanding. arXiv preprint arXiv:2005.13118 (2020)

Download references

Acknowledgements

This work was carried out within the framework of the BPI DeepTech project, in partnership between the University of Lorraine (Ref. UL: GECO/2020/00331), the CNRS, the INRIA Lorraine and the company FAIR&SMART. The authors would like to thank all the partners for their fruitful collaboration.

Author information

Authors and Affiliations

Université de Lorraine-LORIA, Campus Scientifique, 54500, Vandoeuvre-Lès-Nancy, France
Djedjiga Belhadj, Yolande Belaïd & Abdel Belaïd

Authors

Djedjiga Belhadj
View author publications
You can also search for this author in PubMed Google Scholar
Yolande Belaïd
View author publications
You can also search for this author in PubMed Google Scholar
Abdel Belaïd
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Universitat Autònoma de Barcelona, Barcelona, Spain
Josep Lladós
Lehigh University, Bethlehem, PA, USA
Daniel Lopresti
Kyushu University, Fukuoka-shi, Japan
Seiichi Uchida

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Belhadj, D., Belaïd, Y., Belaïd, A. (2021). Consideration of the Word’s Neighborhood in GATs for Information Extraction in Semi-structured Documents. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12822. Springer, Cham. https://doi.org/10.1007/978-3-030-86331-9_55

Download citation

DOI: https://doi.org/10.1007/978-3-030-86331-9_55
Published: 02 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86330-2
Online ISBN: 978-3-030-86331-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)