Elsevier

Knowledge-Based Systems

Volume 252, 27 September 2022, 109471
Knowledge-Based Systems

Latent graph learning with dual-channel attention for relation extraction

https://doi.org/10.1016/j.knosys.2022.109471Get rights and content

Abstract

As a building block of information retrieval, relation extraction aims at predicting the relation type between two given entities in a piece of text. This task becomes challenging when it is confronted with long text that contains many task-unrelated tokens. Recent attempts to solve this problem have resorted to learning the relatedness among tokens. However, how to obtain appropriate graph for better relatedness representation still remains outstanding, while existing methods have room to improve. In this paper, we propose a novel latent graph learning method to enhance the expressivity of contextual information for the entities of interest. In particular, we design a dual-channel attention mechanism for multi-view graph learning and pool the learned multi-views to sift unrelated tokens for latent graph. This process can be repeated many times for refining the latent structure. We show that our method achieves superior performance on several benchmark datasets, compared to strong baseline models and prior multi-view graph learning approach.

Introduction

Relation extraction (RE), which aims to identify the unknown semantic relations among given entity pairs in a text, is fundamental to a variety of downstream natural language processing tasks, ranging from knowledge graph extension, question answering, event detection, to natural language generation. The advent of deep learning has led to prominent improvements in technology of encoding semantic and syntactic information in data [1]. In particular, convolutional neural networks (CNNs) have been successfully introduced [2] to capture the local properties of text. To model not only the local context but the dependencies in long range, attention mechanism has been used to build more expressive and complex models like BERT [3] and its variants [4], [5]. By leveraging BERT for pre-training, the performance of RE models has been improved  [5], [6]. However, the fundamental constraint of contextual learning still remains, irrelevant words are often incorporated into the computation of the BERT network.

Graph modeling enables an approach to learn more fruitful information about indirect interactions among tokens (e.g., the higher-order or multi-hop dependencies), through message-passing based graph neural network techniques. Accordingly, many of its practices [7], [8] in RE have shown significant performance improvement. Typically, each node in graph denotes a token or an entity in the given text. Edges can be the syntactic relations that are derived by an external parser, but this method introduces additional error from dependency parsing. Dependency tree is independent of specific task and may be not able to reflect the semantics of entities’ relations, learning entity representations on it may lead to incorrect relation inference. Alternatively, recent studies [9], [10], [11], [12] turn to learning the latent graph from text directly such that the graph nodes as well as the edges between them are all indicative to target relation. This manner becomes quite challenging when the text is a long sequence.

Example is shown in Table 1, where the goal of the task is to predict the relation type between subject entity ‘46-year-old’ and object entity ‘he’, and the relation type is labeled as ‘age’. It can be seen from the example, that there are many irrelevant tokens around the entities of interest, and complicated syntactic relations are involved. Therefore, it is nontrivial to uncover (by learning) the graph topology related to the target entity pair.

The key is to identify important nodes/edges and filter out those unimportant ones, which is similar to downsampling (pooling) operation in convolutional neural networks. Following the line of multi-view graph learning [12] where it is assumed that an edge in the latent token–token graph embodies multiple types of relationships, we also consider learning latent graph by leveraging multi-view graph representations. In particular, considering that the commonly used self-attention mechanism in graph generating is inadequate to capture meaningful interactions between tokens, we propose a new method called Dual-channel self-Attention mechanism (DAttention) in multi-view graph generation, which learns query and key from two channels and fuses them adaptively for scoring. Moreover, we further distill less-related nodes from the generated graphs (i.e., tokens) with a two-stage graph refining technique to favor long-text RE prediction. Our framework also allows to stack multiple layers of the above process for deep refinement of the latent graph. When initializing graph structure, our method follows the scheme of the previous work [12] but present a different graph initialization method to learn multiple graphs corresponding to multiple views, based on the pre-trained node embeddings from language model to capture all possible relations between tokens. Then we employ graph convolution to learn the respective node representations for each view. With a two-stage graph refining, we have the initial latent graph to feed to the loop of latent graph learning module. Finally, the features from graph learning module and pre-trained language model are merged to feed to the prediction layer. Experiments demonstrate the advantages of DA-GPN on token-level latent graph learning, and DA-GPN achieves the new state-of-the-art on the benchmark datasets for long-text RE task. For the sake of convenience, we refer to our framework as DA-GPN.

The contribution of our work can be summarized as follows:

  • We propose DA-GPN to distill the token-level latent graph, where graph learning module generates the multi-view graph that are able to convey richer information about latent graph, and latent graph is obtained via two-stage refinement of multi-view graph such that only the nodes important to all views are retained.

  • We propose a dual-channel self-attention mechanism for generating multi-views of the latent graph.

  • We also explore the importance of graph initialization as well as the proposed dual channel attention in the whole architecture. Surprisingly, we find that graph initialization makes non-trivial contribution to the overall prediction.

The code is available on Github.1

Section snippets

Related work

We briefly review recent work on sequence-based, graph-based methods and BERT-based methods.

Problem definition

In this section, we first describe the problem formally. Given a sequence X=x1,x2,,xn which contains n tokens and an entity pair (eo, es) of interest. Our task is to predict whether there is a relation between the two entities in the sequence and infer the correct relation type. Subject-entity and object-entity are defined as es=xi,,xj and eo=xk,,xm, respectively. We further use symbols to describe our task as Tr=(es,eo,yi):yiRϵ, where R is a predefined set of relations, and ϵ denotes all

Methodology

The overall model architecture of DA-GPN is shown in Fig. 1. There are three principal components in the proposed model, namely pre-trained language model, graph module and classification. In particular, the pre-training language model (e.g., BERT [3]) is able to capture semantic representation for each token. To be specific, the pre-trained language model encodes a sequence of n tokens X=x1,x2,,xn to be H0=h1,h2,,hn, where x1 represents the start token [CLS], and xn represents the end token

Datasets and experimental setup

The proposed DA-GPN is evaluated on sentence-level and dialogue-level RE tasks with two types of benchmark datasets, respectively. In sentence-level evaluation, TACRED [16] and its two variants are used. TACRED is sentence-level RE dataset, which contains more than 106,264 sentences and 42 different relations, namely 41 common relation types and a special non-relation type. Subject entities in TACRED are people and organizations, while object entities are categorized into 16 fine-grained types,

Conclusion

In this paper, we propose DA-GPN, a novel graph model for reinforcing relation extraction in an end-to-end fashion. Our model dynamically learns complex graph structures and discovers implicit relations between tokens in a long sequence via three key designs: preliminary graph learning for initializing the graph topology, multi-view graph pooling for refining latent graph, and dual-channel self attention for enhancing the learning of the relationships between tokens. DA-GPN achieves decent

CRediT authorship contribution statement

Guogen Tang: Methodology, Software, Writing – original draft. Ping Li: Conceptualization, Writing – original draft, Supervision, Project administration. Yupeng He: Validation, Formal analysis, Visualization. Yan Chen: Validation. Yuan Zhong: Validation. Fangji Gan: Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (33)

  • GravesA. et al.

    Framewise phoneme classification with bidirectional LSTM and other neural network architectures

    Neural Netw.

    (2005)
  • R. Socher, B. Huval, C.D. Manning, A.Y. Ng, Semantic compositionality through recursive matrix-vector spaces, in:...
  • D. Zeng, K. Liu, S. Lai, G. Zhou, J. Zhao, Relation classification via convolutional deep neural network, in:...
  • J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language...
  • LiuY. et al.

    Roberta: A robustly optimized bert pretraining approach

    (2019)
  • JoshiM. et al.

    SpanBERT: Improving pre-training by representing and predicting spans

    Trans. Assoc. Comput. Linguist.

    (2020)
  • S. Wu, Y. He, Enriching pre-trained language model with entity information for relation classification, in: Proceedings...
  • Y. Zhang, P. Qi, C.D. Manning, Graph convolution over pruned dependency trees improves relation extraction, in:...
  • Z. Guo, Y. Zhang, W. Lu, Attention guided graph convolutional networks for relation extraction, in: Proceedings of the...
  • K. Hashimoto, Y. Tsuruoka, Neural machine translation with source-side latent graph parsing, in: Proceedings of the...
  • F. Christopoulou, M. Miwa, S. Ananiadou, A walk-based model on entity graphs for relation extraction, in: Proceedings...
  • F. Christopoulou, M. Miwa, S. Ananiadou, Connecting the dots: Document-level neural relation extraction with...
  • F. Xue, A. Sun, H. Zhang, E.S. Chng, GDPNet: Refining latent multi-view graph for relation extraction, in: Proceedings...
  • R. Cai, X. Zhang, H. Wang, Bidirectional recurrent convolutional neural network for relation classification, in:...
  • Y. Shen, X.-J. Huang, Attention-based convolutional neural network for semantic relation extraction, in: Proceedings of...
  • P. Zhou, W. Shi, J. Tian, Z. Qi, B. Li, H. Hao, B. Xu, Attention-based bidirectional long short-term memory networks...
  • Cited by (0)

    View full text