Skip to main content

Consideration of the Word’s Neighborhood in GATs for Information Extraction in Semi-structured Documents

  • Conference paper
  • First Online:
Document Analysis and Recognition – ICDAR 2021 (ICDAR 2021)

Abstract

Most administrative documents take a semi-structured form (invoices, payslips, etc.). Extracting information from this type of document is still challenging because of the variability of its structure brought about by the change of layout style of the different administrations. In this work, we try to face this type of variation by using a multi-layer Graph Attention Network (GAT). We propose a general structure of a semi-structured document. Based on this latter, we adopt a star sub-graph to exploit the surrounding context of words, allowing neighboring words to help locate the searched words and rank them. The GAT makes it possible to exploit this type of neighborhood and to highlight important neighboring words likely to be better identified. Each graph node contains at the same time textual and visual features. We experiment the multi-layer GAT on three different datasets: invoices and payslips (generated artificially), and receipts (issued from SROIE ICDAR competition). For the later dataset, we get an important F1 score of 0.892.

Supported by BPI DeepTech.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Brown, J.: System and method for identification and extraction of data, US Patent 9,589,183, 7 March 2017

    Google Scholar 

  2. Dengel, A.R., Klein, B.: smartFIX: a requirements-driven system for document analysis and understanding. In: Lopresti, D., Hu, J., Kashi, R. (eds.) DAS 2002. LNCS, vol. 2423, pp. 433–444. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45869-7_47

    Chapter  MATH  Google Scholar 

  3. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  4. Gal, R., Morag, N., Shilkrot, R.: Visual-linguistic methods for receipt field recognition. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11362, pp. 542–557. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20890-5_35

    Chapter  Google Scholar 

  5. Grattarola, D., Alippi, C.: Graph neural networks in tensorflow and keras with spektral. arXiv preprint arXiv:2006.12138 (2020)

  6. Hammami, M., Héroux, P., Adam, S., d’Andecy, V.P.: One-shot field spotting on colored forms using subgraph isomorphism. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 586–590. IEEE (2015)

    Google Scholar 

  7. Heinzerling, B., Strube, M.: Bpemb: tokenization-free pre-trained subword embeddings in 275 languages. arXiv preprint arXiv:1710.02187 (2017)

  8. Hua, Y., Huang, Z., Guo, J., Qiu, W.: Attention-based graph neural network with global context awareness for document understanding. In: Sun, M., Li, S., Zhang, Y., Liu, Y., He, S., Rao, G. (eds.) CCL 2020. LNCS (LNAI), vol. 12522, pp. 45–56. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-63031-7_4

    Chapter  Google Scholar 

  9. Huang, Z., et al.: ICDAR 2019 competition on scanned receipt OCR and information extraction. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1516–1520. IEEE (2019)

    Google Scholar 

  10. Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015)

  11. Kavas, I.: Analytic systems, methods, and computer-readable media for structured, semi-structured, and unstructured documents, US Patent 9,384,264, 5 July 2016

    Google Scholar 

  12. Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360 (2016)

  13. Le, A.D., Pham, D.V., Nguyen, T.A.: Deep learning approach for receipt recognition. In: Dang, T.K., Küng, J., Takizawa, M., Bui, S.H. (eds.) FDSE 2019. LNCS, vol. 11814, pp. 705–712. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-35653-8_50

    Chapter  Google Scholar 

  14. Liu, X., Gao, F., Zhang, Q., Zhao, H.: Graph convolution for multimodal information extraction from visually rich documents. arXiv preprint arXiv:1903.11279 (2019)

  15. Lohani, D., Belaïd, A., Belaïd, Y.: An invoice reading system using a graph convolutional network. In: Carneiro, G., You, S. (eds.) ACCV 2018. LNCS, vol. 11367, pp. 144–158. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21074-8_12

    Chapter  Google Scholar 

  16. Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. arXiv preprint arXiv:1603.01354 (2016)

  17. Majumder, B.P., Potti, N., Tata, S., Wendt, J.B., Zhao, Q., Najork, M.: Representation learning for information extraction from form-like documents. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 6495–6504 (2020)

    Google Scholar 

  18. Rusinol, M., Benkhelfallah, T., Poulain dAndecy, V.: Field extraction from administrative documents by incremental structural templates. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1100–1104. IEEE (2013)

    Google Scholar 

  19. Schuster, D., et al.: Intellix-end-user trained information extraction for document archiving. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 101–105. IEEE (2013)

    Google Scholar 

  20. Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909 (2015)

  21. Shen, Z., Tijerino, Y.: Ontology-based automatic receipt accounting system. In: 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, vol. 3, pp. 236–239. IEEE (2012)

    Google Scholar 

  22. Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., Bengio, Y.: Graph attention networks. In: International Conference on Learning Representations (2018)

    Google Scholar 

  23. Yu, W., Lu, N., Qi, X., Gong, P., Xiao, R.: Pick: processing key information extraction from documents using improved graph learning-convolutional networks. arXiv preprint arXiv:2004.07464 (2020)

  24. Zhang, K., Li, J.Z., Hong, M.C., Yan, X.D., Song, Q.: A semantics enabled intelligent semi-structured document processor. In: Yuan, Y., Wu, X., Lu, Y. (eds.) ISCTCS 2013. CCIS, vol. 426, pp. 328–344. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-43908-1_41

    Chapter  Google Scholar 

  25. Zhang, P., et al.: TRIE: end-to-end text reading and information extraction for document understanding. arXiv preprint arXiv:2005.13118 (2020)

Download references

Acknowledgements

This work was carried out within the framework of the BPI DeepTech project, in partnership between the University of Lorraine (Ref. UL: GECO/2020/00331), the CNRS, the INRIA Lorraine and the company FAIR&SMART. The authors would like to thank all the partners for their fruitful collaboration.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Belhadj, D., Belaïd, Y., Belaïd, A. (2021). Consideration of the Word’s Neighborhood in GATs for Information Extraction in Semi-structured Documents. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12822. Springer, Cham. https://doi.org/10.1007/978-3-030-86331-9_55

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86331-9_55

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86330-2

  • Online ISBN: 978-3-030-86331-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics