Event-based knowledge reconciliation using frame embeddings and frame similarity

doi:10.1016/j.knosys.2017.08.014

Knowledge-Based Systems

Volume 135, 1 November 2017, Pages 192-203

https://doi.org/10.1016/j.knosys.2017.08.014 Get rights and content

Abstract

This paper proposes an evolution over MERGILO, a tool for reconciling knowledge graphs extracted from text, using graph alignment and word similarity. The reconciled knowledge graphs are typically used for multi-document summarization, or to detect knowledge evolution across document series. The main point of improvement focuses on event reconciliation i.e., reconciling knowledge graphs generated by text about two similar events described differently. In order to gather a complete semantic representation of events, we use FRED semantic web machine reader, jointly with Framester, a linguistic linked data hub represented using a novel formal semantics for frames. Framester is used to enhance the extracted event knowledge with semantic frames. We extend MERGILO with similarities based on the graph structure of semantic frames and the subsumption hierarchy of semantic roles as defined in Framester. With an effective evaluation strategy similarly as used for MERGILO, we show the improvement of the new approach (MERGILO plus semantic frame/role similarities) over the baseline.

Introduction

Several approaches have been proposed for extracting knowledge graphs from text. These knowledge graphs are generated with the aim of making unstructured text machine-readable  [1]. In case of multiple texts explaining similar events, it is more efficient and usable to provide the machine with a combination of multiple graphs generated by multiple texts. Using this merged graph, a machine reader can obtain knowledge contained in multiple texts from a single consolidated graph instead of reading several graphs. This problem, termed as ”Knowledge Reconciliation” (KR), has recently been addressed by MERGILO [2], a tool for reconciling knowledge graphs using graph alignment and word similarity. These reconciled knowledge graphs can further be utilized by specific NLP applications, in particular by graph-based text summarization (which aims at summarizing knowledge represented in multiple closely related pieces of text), for assessing sentence or document similarity, etc.

The current study mainly targets the problem of knowledge reconciliation from the perspective of events. In a text, a complete description of an event is syntactically denoted by a verb, since it defines a relation between event participants. The first step in the event-based knowledge reconciliation is to extract event-oriented knowledge graphs. For doing so, we use FRED, a machine reader presented in [1], which generates an RDF/OWL graph of any open domain input text.

For dealing with different lexical units describing the same or similar events, we enhance the existing pipeline by enriching the knowledge graphs generated by FRED with semantic frames as defined in FrameNet¹. For this purpose, this study further makes use of mappings between VerbNet² (i.e., VerbNet verb classes and VerbNet roles) and FrameNet, as contained in Framester [3]. Framester is a linguistic linked data hub formulated using a novel formal semantics for frames for improving semantic interoperability between linguistic resources. Framester uses the RDF version of FrameNet [4]³, formalizes the FrameNet graph in OWL, and introduces a very rich subsumption hierarchy related to FrameNet frame elements (semantic roles).

We use Framester graph representations as a way to improve similarity between the nodes and the edges, where nodes represent the frames and edges represent the roles. When different verbs denote similar events, i.e. different verbs evoke different frames which are somehow connected in the FrameNet graph using the semantic relations already defined in FrameNet (such as Inheritance, SubFrame, ...), we can greatly improve simple string matching techniques introduced in MERGILO with frame as well as semantic role similarity measures. For doing so we considered the similarities based on the graph structure of the FrameNet frames as well as the subsumption hierarchy associated to the semantic roles defined in Framester. FrameNet graph organizes frames using semantic relations; to benefit from this graphical structure we adapt WordNet similarity measures [5] to FrameNet graph. We further exploit the vector representations of frames using the FrameNet graph and the subsumption hierarchy of roles as represented in Framester. We follow the approach RDF2Vec [6] to generate graph based frame embeddings referred to as Frame2Vec. These graph-based embeddings make use of graph mining algorithms such as graph walks and graph kernels to traverse over the graph, which is further used for generating its vector representations. In order to find the similarity between two frames and between two roles, this study uses WordNet similarities and cosine similarity for obtaining better consolidation between multiple graphs, which lead to an improvement over the results of a baseline algorithm for knowledge reconciliation, MERGILO  [2]. MERGILO already computes the similarity between the roles represented as edges in the FRED graphs but it merely performs string matching for finding if the roles are similar. These embeddings can further be used for any NLP application, however in the current scenario we use it for knowledge reconciliation purposes.

More in detail, the paper is organized as follows. Section 2 introduces state of art and related work. Section 3 lists the data sources, resources and tools we have adopted in our methodology. Then, Section 4 gives some details of MERGILO and its functionalities for use as basis for the Section 5, which explains how frame semantics have been employed for improving MERGILO. Section 6 shows a precision-recall analysis for the presented approach on the dataset introduced in [2]. Finally, Section 7 concludes the paper with discussions, remarks and highlights some future directions.

Section snippets

From text to knowledge graphs

Given the large amount of unstructured text, it has become a key challenge to extract structured information and knowledge from that and integrate it into a coherent knowledge graph. There are several applications which aim at extracting these structures such as digital assistants (Siri, Alexa, Cortana, and Google Now), question answering, summarization. Projects such as Never Ending Language Learning (NELL) [7], OpenIE [8], YAGO [9], and Google Knowledge Vault [10] proposed various

VerbNet

VerbNet [37] is a broad coverage verb lexicon in English with links to other data sources such as WordNet [38] and FrameNet [39]. VerbNet contains semantic roles and patterns which allows to form a verb class called as Levin’s classes. It generalizes the verbs based on their shared syntactic behavior. These verb classes are structured into a hierarchy of parents and their subclasses. For example, the verb conquer is a member of the class subjugate-42.3 which means to bring under domination.

MERGILO

MERGILO [2] is a method for generating and integrating knowledge graphs extracted from multiple text documents by using FRED, a machine reader. Given two input sentences, it extracts the associated knowledge graph using FRED.

Event-based knowledge reconciliation

Let us consider the two sentences: “The Spaniards conquered the Incas.” and “The Incas were attacked by the Spaniards.” The two sentences are addressing two actions related to the same happening in the past i.e., event of an attack or an invasion from Spaniards to Incas. In such a case, the similarity measures introduced by MERGILO will not be able to effectively consider the similarity between the two events because the two verbs are different. Figs. 3 and 6 show the FRED graphs of the first

Experimentation

We conducted several experiments to evaluate the feasibility of our approach. We built on top of the EECB 1.0 [19] gold standard for CCR (cluster 1) and transferred the coreferences between mentions into coreferences between entities with a semi-automatic process. The EECB gold standard is an extension of ECB [18], a corpus annotated with event coreferences, that also contains entity coreference annotations. ECB contains text found through Google Search that was annotated with mentions, events

Conclusions

This paper presents an extension of MERGILO, a tool for reconciling knowledge graphs using graph alignment and word similarity. This study exploits Framester, a linguistic data hub formulated using a novel formal semantics for frames, in order to enhance semantic interoperability between linguistic resources. This paper introduces several ways for improving the basic MERGILO pipeline to deal with event-based knowledge reconciliation. In particular, several path-based similarity measures for

Acknowledgment

The research leading to these results has received funding from the European Union Horizon 2020 the Framework Programme for Research and Innovation (2014-2020) under grant agreement 643808 Project MARIO Managing active and healthy aging with use of caring service robots as well as by a public grant overseen by the French National Research Agency (ANR) as part of the program “Investissements d’Avenir” (reference: ANR-10-LABX-0083). Moreover, the authors gratefully acknowledge Sardinia Regional

References (62)

M. Mongiovì et al.
Merging open knowledge extracted from text with MERGILO
Knowl.-Based Syst.
(2016)
X. Dong et al.
Knowledge vault: a web-scale approach to probabilistic knowledge fusion
Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
(2014)
J. Bos et al.
Combining discourse representation theory with framenet
Frames, Corpora, and Knowledge Representation
(2008)
A. Gangemi et al.
Semantic web machine reading with FRED
Semantic Web J.
(2016)
A. Gangemi et al.
Framester: a wide coverage linguistic linked data hub
A.G. Nuzzolese et al.
Gathering lexical linked data and knowledge patterns from FrameNet
A. Budanitsky et al.
Evaluating wordnet-based measures of lexical semantic relatedness
Comput. Linguist.
(2006)
P. Ristoski et al.
Rdf2vec: RDF graph embeddings for data mining
A. Carlson et al.
Toward an architecture for never-ending language learning.
AAAI
(2010)
A. Fader et al.
Identifying relations for open information extraction
Proceedings of the Conference on Empirical Methods in Natural Language Processing
(2011)

J. Hoffart et al.

Yago2: a spatially and temporally enhanced knowledge base from wikipedia

Artif. Intell.

(2013)

S. Brin, Extracting patterns and relations from the world-wide web, in: Proceedings of the 1998 International Workshop...

E. Agichtein et al.

Snowball: extracting relations from large plain-text collections

Proceedings of the Fifth ACM Conference on Digital Libraries

(2000)

M. Banko et al.

Open information extraction from the web

Proceedings of the 20th International Joint Conference on Artifical Intelligence

(2007)

S. Dutta et al.

Cross-document co-reference resolution using sample-based clustering with knowledge enrichment

Trans. Assoc. Comput.Linguist.

(2015)

D. Rao et al.

Streaming cross document entity coreference resolution

23rd International Conference on Computational Linguistics

(2010)

S. Singh et al.

Large-scale cross-document coreference using distributed inference and hierarchical models

Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

(2011)

N. Andrews et al.

Robust entity clustering via phylogenetic inference

Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics

(2014)

C. Bejan et al.

Unsupervised event coreference resolution

Comput. Linguist.

(2014)

H. Lee et al.

Joint entity and event coreference resolution across documents

Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

(2012)

J. Euzenat et al.

Ontology matching

(2013)

F.M. Suchanek et al.

PARIS: Probabilistic alignment of relations, instances, and schema

Proceedings of the VLDB Endowment

(2011)

S. Lacoste-Julien et al.

Sigma: simple greedy matching for aligning large knowledge bases

KDD2013

(2013)

J.T. Vogelstein et al.

Fast approximate quadratic programming for graph matching

PLoS ONE

(2015)

G.W. Klau

A new graph-based method for pairwise global network alignment

BMC Bioinform.

(2009)

M. Mongiovì et al.

Global alignment of protein–protein interaction networks

Data Mining for Systems Biology: Methods and Protocols

(2013)

D.R. Recupero

Efficient graph matching

Encyclopedia of Data Warehousing and Mining

(2009)

J.A. Bullinaria et al.

Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and svd

Behav. Res. Methods

(2012)

S. Deerwester et al.

Indexing by latent semantic analysis

J. Am. Soc. Inf.Sci.

(1990)

T. Mikolov et al.

Efficient estimation of word representations in vector space

CoRR

(2013)

Q.V. Le et al.

Distributed representations of sentences and documents

Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21–26 June 2014

(2014)

Cited by (21)

TaxonPrompt: Taxonomy-aware curriculum prompt learning for few-shot event classification
2023, Knowledge-Based Systems
Event classification (EC) aims to assign the event labels to unlabeled sentences and tends to struggle in real-world applications when only a few annotated samples are available. Previous studies have mainly focused on using meta-learning to overcome the low-resource problem where label data from other tasks are still required for model learning and selection. Accordingly, prompt learning-based approaches are proposed to address the low-resource issue. However, such approaches generally ignore task-specific information and adopt demonstration learning for fine-tuning, which fails to leverage the most informative examples for training and hurts performance. Thus, we propose a taxonomy-aware prompt learning framework TaxonPrompt that trains the language model with samples from easy to hard by imitating the human curricula, which effectively alleviates the classification bottleneck caused by insufficient data. We first design an event prompt generation (EPG) for automatically generating task-specific templates using sentences, labels, and trigger words. Then, we propose a Fisher information-based demonstration filtering (FDF) to dynamically select the most informative support examples for each query to train the model. We have conducted extensive experiments on two EC datasets: FewEvent and RAMS. The experimental results demonstrate the superiority of the proposed model over state-of-the-art baselines. In particular, our approach works well in the scenario of an extremely small number of available task resources and therefore constitutes a solution for few-shot event classification.
Special Issue on Machine Learning and Knowledge Graphs
2022, Future Generation Computer Systems
A novel knowledge graph-based optimization approach for resource allocation in discrete manufacturing workshops
2021, Robotics and Computer-Integrated Manufacturing
Citation Excerpt :
Knowledge graph [10] is a structured semantic knowledge base, composed of the triples with entity-relationship-entity and entity-attribute-value, which can better describe the data in the data layer. In addition, knowledge graph is capable of processing large-scale data with complex structures, computing semantic similarities between different terms, and reasoning the implicit relationship between entities without the explicit edges [11,12]. The knowledge graph has emerged in the manufacturing field [13,14].
Dynamic personalized orders demand and uncertain manufacturing resource availability have become the research hotspots of intelligent resource optimization allocation. Currently, the data generated from the manufacturing industry are rapidly expanding. Such data are multi-source, heterogeneous and multi-scale. Transforming the data into knowledge to optimize the allocation between personalized orders and manufacturing resources is an effective strategy to improve the cognitive intelligent production level of enterprises. However, the manufacturing processes in resource allocation is diversity. There are many rules and constraints among the data. And the relationship among data is more complicated. There lacks a unified approach to information modeling and industrial knowledge generation from mining semantic information from massive manufacturing data. The research challenge is how to fully integrate the complex data of workshop resources and mine the implicit semantic information to form a viable knowledge-driven resource allocation optimization method. Such method can then efficiently provide the relevant engineering information needed for resource allocation. This research presented a unified knowledge graph-driven production resource allocation approach, allowing fast resource allocation decision-making for given order inserting tasks, subject to the resource machining information and the device evaluation strategy. The workshop resource knowledge graph (WRKG) model was presented to integrate the engineering semantic information in the machining workshop. A distributed knowledge representation learning algorithm was developed to mine the implicit resource information for updating the WRKG in real-time. Moreover, a three-staged resource allocation optimization method supported by the WRKG was proposed to output the device sets needed for a specific task. A case study of the manufacturing resource allocation process task in an aerospace enterprise was used to demonstrate the feasibility of the proposed approach.
Intelligent blockchain management for distributed knowledge graphs in IoT 5G environments
2024, Transactions on Emerging Telecommunications Technologies
An ontology matching approach for semantic modeling: A case study in smart cities
2022, Computational Intelligence
Towards Detecting Fake News Using Natural Language Understanding and Reasoning in Description Logics
2022, Communications in Computer and Information Science

View all citing articles on Scopus

View full text

Event-based knowledge reconciliation using frame embeddings and frame similarity

Abstract

Introduction

Section snippets

From text to knowledge graphs

VerbNet

MERGILO

Event-based knowledge reconciliation

Experimentation

Conclusions

Acknowledgment

Knowl.-Based Syst.

Semantic web machine reading with FRED

Semantic Web J.

Framester: a wide coverage linguistic linked data hub

Gathering lexical linked data and knowledge patterns from FrameNet

Evaluating wordnet-based measures of lexical semantic relatedness

Comput. Linguist.

Rdf2vec: RDF graph embeddings for data mining

Toward an architecture for never-ending language learning.

AAAI

Identifying relations for open information extraction

Proceedings of the Conference on Empirical Methods in Natural Language Processing

Yago2: a spatially and temporally enhanced knowledge base from wikipedia

Artif. Intell.

Snowball: extracting relations from large plain-text collections

Proceedings of the Fifth ACM Conference on Digital Libraries

Open information extraction from the web

Proceedings of the 20th International Joint Conference on Artifical Intelligence

Cross-document co-reference resolution using sample-based clustering with knowledge enrichment

Trans. Assoc. Comput.Linguist.

Streaming cross document entity coreference resolution

23rd International Conference on Computational Linguistics

Large-scale cross-document coreference using distributed inference and hierarchical models

Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

Robust entity clustering via phylogenetic inference

Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics

Unsupervised event coreference resolution

Comput. Linguist.

Joint entity and event coreference resolution across documents

Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Ontology matching

PARIS: Probabilistic alignment of relations, instances, and schema

Proceedings of the VLDB Endowment

Sigma: simple greedy matching for aligning large knowledge bases

KDD2013

Fast approximate quadratic programming for graph matching

PLoS ONE

A new graph-based method for pairwise global network alignment

BMC Bioinform.

Global alignment of protein–protein interaction networks

Data Mining for Systems Biology: Methods and Protocols

Efficient graph matching

Encyclopedia of Data Warehousing and Mining

Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and svd

Behav. Res. Methods

Indexing by latent semantic analysis

J. Am. Soc. Inf.Sci.

Efficient estimation of word representations in vector space

CoRR

Distributed representations of sentences and documents

Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21–26 June 2014