Elsevier

Knowledge-Based Systems

Volume 135, 1 November 2017, Pages 192-203
Knowledge-Based Systems

Event-based knowledge reconciliation using frame embeddings and frame similarity

https://doi.org/10.1016/j.knosys.2017.08.014Get rights and content

Abstract

This paper proposes an evolution over MERGILO, a tool for reconciling knowledge graphs extracted from text, using graph alignment and word similarity. The reconciled knowledge graphs are typically used for multi-document summarization, or to detect knowledge evolution across document series. The main point of improvement focuses on event reconciliation i.e., reconciling knowledge graphs generated by text about two similar events described differently. In order to gather a complete semantic representation of events, we use FRED semantic web machine reader, jointly with Framester, a linguistic linked data hub represented using a novel formal semantics for frames. Framester is used to enhance the extracted event knowledge with semantic frames. We extend MERGILO with similarities based on the graph structure of semantic frames and the subsumption hierarchy of semantic roles as defined in Framester. With an effective evaluation strategy similarly as used for MERGILO, we show the improvement of the new approach (MERGILO plus semantic frame/role similarities) over the baseline.

Introduction

Several approaches have been proposed for extracting knowledge graphs from text. These knowledge graphs are generated with the aim of making unstructured text machine-readable  [1]. In case of multiple texts explaining similar events, it is more efficient and usable to provide the machine with a combination of multiple graphs generated by multiple texts. Using this merged graph, a machine reader can obtain knowledge contained in multiple texts from a single consolidated graph instead of reading several graphs. This problem, termed as ”Knowledge Reconciliation” (KR), has recently been addressed by MERGILO [2], a tool for reconciling knowledge graphs using graph alignment and word similarity. These reconciled knowledge graphs can further be utilized by specific NLP applications, in particular by graph-based text summarization (which aims at summarizing knowledge represented in multiple closely related pieces of text), for assessing sentence or document similarity, etc.

The current study mainly targets the problem of knowledge reconciliation from the perspective of events. In a text, a complete description of an event is syntactically denoted by a verb, since it defines a relation between event participants. The first step in the event-based knowledge reconciliation is to extract event-oriented knowledge graphs. For doing so, we use FRED, a machine reader presented in [1], which generates an RDF/OWL graph of any open domain input text.

For dealing with different lexical units describing the same or similar events, we enhance the existing pipeline by enriching the knowledge graphs generated by FRED with semantic frames as defined in FrameNet1. For this purpose, this study further makes use of mappings between VerbNet2 (i.e., VerbNet verb classes and VerbNet roles) and FrameNet, as contained in Framester [3]. Framester is a linguistic linked data hub formulated using a novel formal semantics for frames for improving semantic interoperability between linguistic resources. Framester uses the RDF version of FrameNet [4]3, formalizes the FrameNet graph in OWL, and introduces a very rich subsumption hierarchy related to FrameNet frame elements (semantic roles).

We use Framester graph representations as a way to improve similarity between the nodes and the edges, where nodes represent the frames and edges represent the roles. When different verbs denote similar events, i.e. different verbs evoke different frames which are somehow connected in the FrameNet graph using the semantic relations already defined in FrameNet (such as Inheritance, SubFrame, ...), we can greatly improve simple string matching techniques introduced in MERGILO with frame as well as semantic role similarity measures. For doing so we considered the similarities based on the graph structure of the FrameNet frames as well as the subsumption hierarchy associated to the semantic roles defined in Framester. FrameNet graph organizes frames using semantic relations; to benefit from this graphical structure we adapt WordNet similarity measures [5] to FrameNet graph. We further exploit the vector representations of frames using the FrameNet graph and the subsumption hierarchy of roles as represented in Framester. We follow the approach RDF2Vec [6] to generate graph based frame embeddings referred to as Frame2Vec. These graph-based embeddings make use of graph mining algorithms such as graph walks and graph kernels to traverse over the graph, which is further used for generating its vector representations. In order to find the similarity between two frames and between two roles, this study uses WordNet similarities and cosine similarity for obtaining better consolidation between multiple graphs, which lead to an improvement over the results of a baseline algorithm for knowledge reconciliation, MERGILO  [2]. MERGILO already computes the similarity between the roles represented as edges in the FRED graphs but it merely performs string matching for finding if the roles are similar. These embeddings can further be used for any NLP application, however in the current scenario we use it for knowledge reconciliation purposes.

More in detail, the paper is organized as follows. Section 2 introduces state of art and related work. Section 3 lists the data sources, resources and tools we have adopted in our methodology. Then, Section 4 gives some details of MERGILO and its functionalities for use as basis for the Section 5, which explains how frame semantics have been employed for improving MERGILO. Section 6 shows a precision-recall analysis for the presented approach on the dataset introduced in [2]. Finally, Section 7 concludes the paper with discussions, remarks and highlights some future directions.

Section snippets

From text to knowledge graphs

Given the large amount of unstructured text, it has become a key challenge to extract structured information and knowledge from that and integrate it into a coherent knowledge graph. There are several applications which aim at extracting these structures such as digital assistants (Siri, Alexa, Cortana, and Google Now), question answering, summarization. Projects such as Never Ending Language Learning (NELL) [7], OpenIE [8], YAGO [9], and Google Knowledge Vault [10] proposed various

VerbNet

VerbNet [37] is a broad coverage verb lexicon in English with links to other data sources such as WordNet [38] and FrameNet [39]. VerbNet contains semantic roles and patterns which allows to form a verb class called as Levin’s classes. It generalizes the verbs based on their shared syntactic behavior. These verb classes are structured into a hierarchy of parents and their subclasses. For example, the verb conquer is a member of the class subjugate-42.3 which means to bring under domination.

MERGILO

MERGILO [2] is a method for generating and integrating knowledge graphs extracted from multiple text documents by using FRED, a machine reader. Given two input sentences, it extracts the associated knowledge graph using FRED.

Event-based knowledge reconciliation

Let us consider the two sentences: “The Spaniards conquered the Incas.” and “The Incas were attacked by the Spaniards.” The two sentences are addressing two actions related to the same happening in the past i.e., event of an attack or an invasion from Spaniards to Incas. In such a case, the similarity measures introduced by MERGILO will not be able to effectively consider the similarity between the two events because the two verbs are different. Figs. 3 and 6 show the FRED graphs of the first

Experimentation

We conducted several experiments to evaluate the feasibility of our approach. We built on top of the EECB 1.0 [19] gold standard for CCR (cluster 1) and transferred the coreferences between mentions into coreferences between entities with a semi-automatic process. The EECB gold standard is an extension of ECB [18], a corpus annotated with event coreferences, that also contains entity coreference annotations. ECB contains text found through Google Search that was annotated with mentions, events

Conclusions

This paper presents an extension of MERGILO, a tool for reconciling knowledge graphs using graph alignment and word similarity. This study exploits Framester, a linguistic data hub formulated using a novel formal semantics for frames, in order to enhance semantic interoperability between linguistic resources. This paper introduces several ways for improving the basic MERGILO pipeline to deal with event-based knowledge reconciliation. In particular, several path-based similarity measures for

Acknowledgment

The research leading to these results has received funding from the European Union Horizon 2020 the Framework Programme for Research and Innovation (2014-2020) under grant agreement 643808 Project MARIO Managing active and healthy aging with use of caring service robots as well as by a public grant overseen by the French National Research Agency (ANR) as part of the program “Investissements d’Avenir” (reference: ANR-10-LABX-0083). Moreover, the authors gratefully acknowledge Sardinia Regional

References (62)

  • J. Hoffart et al.

    Yago2: a spatially and temporally enhanced knowledge base from wikipedia

    Artif. Intell.

    (2013)
  • S. Brin, Extracting patterns and relations from the world-wide web, in: Proceedings of the 1998 International Workshop...
  • E. Agichtein et al.

    Snowball: extracting relations from large plain-text collections

    Proceedings of the Fifth ACM Conference on Digital Libraries

    (2000)
  • M. Banko et al.

    Open information extraction from the web

    Proceedings of the 20th International Joint Conference on Artifical Intelligence

    (2007)
  • S. Dutta et al.

    Cross-document co-reference resolution using sample-based clustering with knowledge enrichment

    Trans. Assoc. Comput.Linguist.

    (2015)
  • D. Rao et al.

    Streaming cross document entity coreference resolution

    23rd International Conference on Computational Linguistics

    (2010)
  • S. Singh et al.

    Large-scale cross-document coreference using distributed inference and hierarchical models

    Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

    (2011)
  • N. Andrews et al.

    Robust entity clustering via phylogenetic inference

    Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics

    (2014)
  • C. Bejan et al.

    Unsupervised event coreference resolution

    Comput. Linguist.

    (2014)
  • H. Lee et al.

    Joint entity and event coreference resolution across documents

    Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

    (2012)
  • J. Euzenat et al.

    Ontology matching

    (2013)
  • F.M. Suchanek et al.

    PARIS: Probabilistic alignment of relations, instances, and schema

    Proceedings of the VLDB Endowment

    (2011)
  • S. Lacoste-Julien et al.

    Sigma: simple greedy matching for aligning large knowledge bases

    KDD2013

    (2013)
  • J.T. Vogelstein et al.

    Fast approximate quadratic programming for graph matching

    PLoS ONE

    (2015)
  • G.W. Klau

    A new graph-based method for pairwise global network alignment

    BMC Bioinform.

    (2009)
  • M. Mongiovì et al.

    Global alignment of protein–protein interaction networks

    Data Mining for Systems Biology: Methods and Protocols

    (2013)
  • D.R. Recupero

    Efficient graph matching

    Encyclopedia of Data Warehousing and Mining

    (2009)
  • J.A. Bullinaria et al.

    Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and svd

    Behav. Res. Methods

    (2012)
  • S. Deerwester et al.

    Indexing by latent semantic analysis

    J. Am. Soc. Inf.Sci.

    (1990)
  • T. Mikolov et al.

    Efficient estimation of word representations in vector space

    CoRR

    (2013)
  • Q.V. Le et al.

    Distributed representations of sentences and documents

    Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21–26 June 2014

    (2014)
  • Cited by (21)

    • Special Issue on Machine Learning and Knowledge Graphs

      2022, Future Generation Computer Systems
    • A novel knowledge graph-based optimization approach for resource allocation in discrete manufacturing workshops

      2021, Robotics and Computer-Integrated Manufacturing
      Citation Excerpt :

      Knowledge graph [10] is a structured semantic knowledge base, composed of the triples with entity-relationship-entity and entity-attribute-value, which can better describe the data in the data layer. In addition, knowledge graph is capable of processing large-scale data with complex structures, computing semantic similarities between different terms, and reasoning the implicit relationship between entities without the explicit edges [11,12]. The knowledge graph has emerged in the manufacturing field [13,14].

    • Intelligent blockchain management for distributed knowledge graphs in IoT 5G environments

      2024, Transactions on Emerging Telecommunications Technologies
    View all citing articles on Scopus
    View full text