skip to main content
10.1145/3401832.3402681acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Ontology mediated information extraction in financial domain with Mastro System-T

Published:15 June 2020Publication History

ABSTRACT

Information extraction (IE) refers to the task of turning text documents into a structured form, in order to make the information contained therein automatically processable. Ontology Mediated Information Extraction (OMIE) is a new paradigm for IE that seeks to exploit the semantic knowledge expressed in ontologies to improve query answering over unstructured data (properly raw text). In this paper we present Mastro System-T, an OMIE tool born from a joint collaboration between the University of Rome "La Sapienza" and IBM Research Almaden and its first application in a financial domain, namely to facilitate the access to and the sharing of data extracted from the EDGAR system.

References

  1. Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, and Riccardo Rosati. 2007. Tractable reasoning and efficient query answering in description logics: The DL-Lite family. Journal of Automated reasoning 39, 3 (2007), 385--429.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Laura Chiticariu, Rajasekar Krishnamurthy, Yunyao Li, Sriram Raghavan, Frederick R Reiss, and Shivakumar Vaithyanathan. 2010. SystemT: an algebraic approach to declarative information extraction. In Proc. of the 48th Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, 128--137.Google ScholarGoogle Scholar
  3. Jim Cowie and Wendy Lehnert. 1996. Information extraction. Commun. ACM 39, 1 (1996), 80--91.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Souripriya Das, Seema Sundara, and Richard Cyganiak. 2012. R2RML: RDB to RDF Mapping Language. W3C Recommendation. W3C. Available at http://www.w3.org/TR/r2rml/.Google ScholarGoogle Scholar
  5. Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, Antonella Poggi, Riccardo Rosati, Marco Ruzzi, and Domenico Fabio Savo. 2012. MASTRO: A Reasoner for Effective Ontology-Based Data Access. In Proc. of the 1st Int. Workshop on OWL Reasoner Evaluation (ORE).Google ScholarGoogle Scholar
  6. Ronald Fagin, Benny Kimelfeld, Frederick Reiss, and Stijn Vansummeren. 2015. Document spanners: A formal approach to information extraction. J. ACM 62, 2 (2015), 1--51.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Dayne Freitag. 2000. Machine learning for information extraction in informal domains. Machine learning 39, 2-3 (2000), 169--202.Google ScholarGoogle Scholar
  8. Giulio Ganino, Domenico Lembo, Massimo Mecella, and Federico Scafoglieri. 2018. Ontology population for open-source intelligence: A GATE-based solution. Software: Practice and Experience 48, 12 (2018), 2302--2330.Google ScholarGoogle ScholarCross RefCross Ref
  9. Tom Gruber. 2018. Ontology. In Encyclopedia of Database Systems, Second Edition. Springer.Google ScholarGoogle Scholar
  10. Steve Harris and Andy Seaborne. 2013. SPARQL 1.1 Query Language. W3C Recommendation. W3C. Available at http://www.w3.org/TR/sparql11-query.Google ScholarGoogle Scholar
  11. Alexander Hogenboom, Frederik Hogenboom, Flavius Frasincar, Kim Schouten, and Otto Van Der Meer. 2013. Semantics-based information extraction for detecting economic events. Multimedia Tools and Applications 64, 1 (2013), 27--52.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Rajasekar Krishnamurthy, Yunyao Li, Sriram Raghavan, Frederick Reiss, Shivakumar Vaithyanathan, and Huaiyu Zhu. 2009. SystemT: a system for declarative information extraction. ACM SIGMOD Record 37, 4 (2009), 7--13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Domenico Lembo, Daniele Pantaleone, Valerio Santarelli, and Domenico Fabio Savo. 2016. Easy OWL drawing with the graphol visual ontology language. In Proc. of the 15th Int. Conf. on Principles of Knowledge Representation and Reasoning (KR). 573--576.Google ScholarGoogle Scholar
  14. Domenico Lembo, Daniele Pantaleone, Valerio Santarelli, and Domenico Fabio Savo. 2018. Drawing OWL 2 ontologies with Eddy the editor. AI Commun. 31, 1 (2018), 97--113.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Domenico Lembo and Federico Maria Scafoglieri. 2020. Ontology-based Document Spanning Systems for Information Extraction. Int. Journal of Semantic Computing (2020).Google ScholarGoogle Scholar
  16. Deborah L McGuinness, Frank Van Harmelen, et al. 2004. OWL web ontology language overview. W3C Recommendation 10, 10 (2004), 2004.Google ScholarGoogle Scholar
  17. Boris Motik, Achille Fokoue, Ian Horrocks, Zhe Wu, Carsten Lutz, and Bernardo Cuenca Grau. 2009. OWLWeb Ontology Language Profiles. W3C Recommendation. W3C. Available at http://www.w3.org/TR/owl-profiles/.Google ScholarGoogle Scholar
  18. Boris Motik, Bijan Parsia, and Peter F. Patel-Schneider. 2012. OWL 2 Web Ontology Language Structural Specification and Functional-Style Syntax (Second Edition). W3C Recommendation. W3C. Available at http://www.w3.org/TR/owl2-syntax/.Google ScholarGoogle Scholar
  19. Antonella Poggi, Domenico Lembo, Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Riccardo Rosati. 2008. Linking Data to Ontologies. Journal on Data Semantics X (2008), 133--173.Google ScholarGoogle Scholar
  20. Borislav Popov, Atanas Kiryakov, Damyan Ognyanoff, Dimitar Manov, Angel Kirilov, and Miroslav Goranov. 2003. Towards semantic web information extraction. In Proc. of the Human Language Technologies Workshop at ISWC 2003, Vol. 20.Google ScholarGoogle Scholar
  21. Frederick Reiss, Sriram Raghavan, Rajasekar Krishnamurthy, Huaiyu Zhu, and Shivakumar Vaithyanathan. 2008. An algebraic approach to rule-based information extraction. In 2008 IEEE 24th Int. Conf. on Data Engineering. IEEE, 933--942.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Horacio Saggion, Adam Funk, Diana Maynard, and Kalina Bontcheva. 2007. Ontology-Based Information Extraction for Business Intelligence. In Proc. of the 6th Int. Semantic Web Conf. and the, 2nd Asian Semantic Web Conf. (ISWC + ASWC). 843--856.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Federico Maria Scafoglieri and Domenico Lembo. 2019. A formal framework for coupling document spanners with ontologies. In 2019 IEEE 2nd Int. Conf. on Artificial Intelligence and Knowledge Engineering (AIKE). IEEE, 155--162.Google ScholarGoogle ScholarCross RefCross Ref
  24. Michael K. Smith, Chris Welty, and Deborah L. McGuiness. 2004. OWL Web Ontology Language Guide. W3C Recommendation. W3C. Available at http://www.w3.org/TR/owl-guide/.Google ScholarGoogle Scholar
  25. Guohui Xiao, Diego Calvanese, Roman Kontchakov, Domenico Lembo, Antonella Poggi, Riccardo Rosati, and Michael Zakharyaschev. 2018. Ontology-based data access: A survey. In Proc. of the 27th Int. Joint Conf. on Artificial Intelligence (IJCAI). 5511--5519.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Ontology mediated information extraction in financial domain with Mastro System-T

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        DSMM '20: Proceedings of the Sixth International Workshop on Data Science for Macro-Modeling
        June 2020
        23 pages
        ISBN:9781450380300
        DOI:10.1145/3401832

        Copyright © 2020 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 15 June 2020

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate32of64submissions,50%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader