Event-based knowledge reconciliation using frame embeddings and frame similarity
Introduction
Several approaches have been proposed for extracting knowledge graphs from text. These knowledge graphs are generated with the aim of making unstructured text machine-readable [1]. In case of multiple texts explaining similar events, it is more efficient and usable to provide the machine with a combination of multiple graphs generated by multiple texts. Using this merged graph, a machine reader can obtain knowledge contained in multiple texts from a single consolidated graph instead of reading several graphs. This problem, termed as ”Knowledge Reconciliation” (KR), has recently been addressed by MERGILO [2], a tool for reconciling knowledge graphs using graph alignment and word similarity. These reconciled knowledge graphs can further be utilized by specific NLP applications, in particular by graph-based text summarization (which aims at summarizing knowledge represented in multiple closely related pieces of text), for assessing sentence or document similarity, etc.
The current study mainly targets the problem of knowledge reconciliation from the perspective of events. In a text, a complete description of an event is syntactically denoted by a verb, since it defines a relation between event participants. The first step in the event-based knowledge reconciliation is to extract event-oriented knowledge graphs. For doing so, we use FRED, a machine reader presented in [1], which generates an RDF/OWL graph of any open domain input text.
For dealing with different lexical units describing the same or similar events, we enhance the existing pipeline by enriching the knowledge graphs generated by FRED with semantic frames as defined in FrameNet1. For this purpose, this study further makes use of mappings between VerbNet2 (i.e., VerbNet verb classes and VerbNet roles) and FrameNet, as contained in Framester [3]. Framester is a linguistic linked data hub formulated using a novel formal semantics for frames for improving semantic interoperability between linguistic resources. Framester uses the RDF version of FrameNet [4]3, formalizes the FrameNet graph in OWL, and introduces a very rich subsumption hierarchy related to FrameNet frame elements (semantic roles).
We use Framester graph representations as a way to improve similarity between the nodes and the edges, where nodes represent the frames and edges represent the roles. When different verbs denote similar events, i.e. different verbs evoke different frames which are somehow connected in the FrameNet graph using the semantic relations already defined in FrameNet (such as Inheritance, SubFrame, ...), we can greatly improve simple string matching techniques introduced in MERGILO with frame as well as semantic role similarity measures. For doing so we considered the similarities based on the graph structure of the FrameNet frames as well as the subsumption hierarchy associated to the semantic roles defined in Framester. FrameNet graph organizes frames using semantic relations; to benefit from this graphical structure we adapt WordNet similarity measures [5] to FrameNet graph. We further exploit the vector representations of frames using the FrameNet graph and the subsumption hierarchy of roles as represented in Framester. We follow the approach RDF2Vec [6] to generate graph based frame embeddings referred to as Frame2Vec. These graph-based embeddings make use of graph mining algorithms such as graph walks and graph kernels to traverse over the graph, which is further used for generating its vector representations. In order to find the similarity between two frames and between two roles, this study uses WordNet similarities and cosine similarity for obtaining better consolidation between multiple graphs, which lead to an improvement over the results of a baseline algorithm for knowledge reconciliation, MERGILO [2]. MERGILO already computes the similarity between the roles represented as edges in the FRED graphs but it merely performs string matching for finding if the roles are similar. These embeddings can further be used for any NLP application, however in the current scenario we use it for knowledge reconciliation purposes.
More in detail, the paper is organized as follows. Section 2 introduces state of art and related work. Section 3 lists the data sources, resources and tools we have adopted in our methodology. Then, Section 4 gives some details of MERGILO and its functionalities for use as basis for the Section 5, which explains how frame semantics have been employed for improving MERGILO. Section 6 shows a precision-recall analysis for the presented approach on the dataset introduced in [2]. Finally, Section 7 concludes the paper with discussions, remarks and highlights some future directions.
Section snippets
From text to knowledge graphs
Given the large amount of unstructured text, it has become a key challenge to extract structured information and knowledge from that and integrate it into a coherent knowledge graph. There are several applications which aim at extracting these structures such as digital assistants (Siri, Alexa, Cortana, and Google Now), question answering, summarization. Projects such as Never Ending Language Learning (NELL) [7], OpenIE [8], YAGO [9], and Google Knowledge Vault [10] proposed various
VerbNet
VerbNet [37] is a broad coverage verb lexicon in English with links to other data sources such as WordNet [38] and FrameNet [39]. VerbNet contains semantic roles and patterns which allows to form a verb class called as Levin’s classes. It generalizes the verbs based on their shared syntactic behavior. These verb classes are structured into a hierarchy of parents and their subclasses. For example, the verb conquer is a member of the class subjugate-42.3 which means to bring under domination.
MERGILO
MERGILO [2] is a method for generating and integrating knowledge graphs extracted from multiple text documents by using FRED, a machine reader. Given two input sentences, it extracts the associated knowledge graph using FRED.
Event-based knowledge reconciliation
Let us consider the two sentences: “The Spaniards conquered the Incas.” and “The Incas were attacked by the Spaniards.” The two sentences are addressing two actions related to the same happening in the past i.e., event of an attack or an invasion from Spaniards to Incas. In such a case, the similarity measures introduced by MERGILO will not be able to effectively consider the similarity between the two events because the two verbs are different. Figs. 3 and 6 show the FRED graphs of the first
Experimentation
We conducted several experiments to evaluate the feasibility of our approach. We built on top of the EECB 1.0 [19] gold standard for CCR (cluster 1) and transferred the coreferences between mentions into coreferences between entities with a semi-automatic process. The EECB gold standard is an extension of ECB [18], a corpus annotated with event coreferences, that also contains entity coreference annotations. ECB contains text found through Google Search that was annotated with mentions, events
Conclusions
This paper presents an extension of MERGILO, a tool for reconciling knowledge graphs using graph alignment and word similarity. This study exploits Framester, a linguistic data hub formulated using a novel formal semantics for frames, in order to enhance semantic interoperability between linguistic resources. This paper introduces several ways for improving the basic MERGILO pipeline to deal with event-based knowledge reconciliation. In particular, several path-based similarity measures for
Acknowledgment
The research leading to these results has received funding from the European Union Horizon 2020 the Framework Programme for Research and Innovation (2014-2020) under grant agreement 643808 Project MARIO Managing active and healthy aging with use of caring service robots as well as by a public grant overseen by the French National Research Agency (ANR) as part of the program “Investissements d’Avenir” (reference: ANR-10-LABX-0083). Moreover, the authors gratefully acknowledge Sardinia Regional
References (62)
- et al.
Merging open knowledge extracted from text with MERGILO
Knowl.-Based Syst.
(2016) - et al.
Knowledge vault: a web-scale approach to probabilistic knowledge fusion
Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
(2014) - et al.
Combining discourse representation theory with framenet
Frames, Corpora, and Knowledge Representation
(2008) - et al.
Semantic web machine reading with FRED
Semantic Web J.
(2016) - et al.
Framester: a wide coverage linguistic linked data hub
- et al.
Gathering lexical linked data and knowledge patterns from FrameNet
- et al.
Evaluating wordnet-based measures of lexical semantic relatedness
Comput. Linguist.
(2006) - et al.
Rdf2vec: RDF graph embeddings for data mining
- et al.
Toward an architecture for never-ending language learning.
AAAI
(2010) - et al.
Identifying relations for open information extraction
Proceedings of the Conference on Empirical Methods in Natural Language Processing
(2011)
Yago2: a spatially and temporally enhanced knowledge base from wikipedia
Artif. Intell.
Snowball: extracting relations from large plain-text collections
Proceedings of the Fifth ACM Conference on Digital Libraries
Open information extraction from the web
Proceedings of the 20th International Joint Conference on Artifical Intelligence
Cross-document co-reference resolution using sample-based clustering with knowledge enrichment
Trans. Assoc. Comput.Linguist.
Streaming cross document entity coreference resolution
23rd International Conference on Computational Linguistics
Large-scale cross-document coreference using distributed inference and hierarchical models
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies
Robust entity clustering via phylogenetic inference
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics
Unsupervised event coreference resolution
Comput. Linguist.
Joint entity and event coreference resolution across documents
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Ontology matching
PARIS: Probabilistic alignment of relations, instances, and schema
Proceedings of the VLDB Endowment
Sigma: simple greedy matching for aligning large knowledge bases
KDD2013
Fast approximate quadratic programming for graph matching
PLoS ONE
A new graph-based method for pairwise global network alignment
BMC Bioinform.
Global alignment of protein–protein interaction networks
Data Mining for Systems Biology: Methods and Protocols
Efficient graph matching
Encyclopedia of Data Warehousing and Mining
Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and svd
Behav. Res. Methods
Indexing by latent semantic analysis
J. Am. Soc. Inf.Sci.
Efficient estimation of word representations in vector space
CoRR
Distributed representations of sentences and documents
Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21–26 June 2014
Cited by (21)
TaxonPrompt: Taxonomy-aware curriculum prompt learning for few-shot event classification
2023, Knowledge-Based SystemsSpecial Issue on Machine Learning and Knowledge Graphs
2022, Future Generation Computer SystemsA novel knowledge graph-based optimization approach for resource allocation in discrete manufacturing workshops
2021, Robotics and Computer-Integrated ManufacturingCitation Excerpt :Knowledge graph [10] is a structured semantic knowledge base, composed of the triples with entity-relationship-entity and entity-attribute-value, which can better describe the data in the data layer. In addition, knowledge graph is capable of processing large-scale data with complex structures, computing semantic similarities between different terms, and reasoning the implicit relationship between entities without the explicit edges [11,12]. The knowledge graph has emerged in the manufacturing field [13,14].
Intelligent blockchain management for distributed knowledge graphs in IoT 5G environments
2024, Transactions on Emerging Telecommunications TechnologiesAn ontology matching approach for semantic modeling: A case study in smart cities
2022, Computational IntelligenceTowards Detecting Fake News Using Natural Language Understanding and Reasoning in Description Logics
2022, Communications in Computer and Information Science