research-article

Revisiting the Evaluation Protocol of Knowledge Graph Completion Methods for Link Prediction

Authors:
Sudhanshu Tiwari

Rochester Institute of Technology, USA

Rochester Institute of Technology, USA
View Profile

,
Iti Bansal

Rochester Institute of Technology, USA

Rochester Institute of Technology, USA
View Profile

,
Carlos R. Rivero

Rochester Institute of Technology, USA

Rochester Institute of Technology, USA
View Profile

Authors Info & Claims

WWW '21: Proceedings of the Web Conference 2021April 2021Pages 809–820https://doi.org/10.1145/3442381.3449856

Published:03 June 2021Publication History

WWW '21: Proceedings of the Web Conference 2021

Pages 809–820

ABSTRACT

Completion methods learn models to infer missing (subject, predicate, object) triples in knowledge graphs, a task known as link prediction. The training phase is based on samples of positive triples and their negative counterparts. The test phase consists of ranking each positive triple with respect to its negative counterparts based on the scores obtained by a learned model. The best model ranks all positive triples first. Metrics like mean rank, mean reciprocal rank and hits at k are used to assess accuracy. Under this generic evaluation protocol, we observe several shortcomings: 1) Current metrics assume that each measurement is upper bounded by the same constant value and, therefore, are oblivious to the fact that, in link prediction, each positive triple may have a different number of negative counterparts, which alters the difficulty of ranking positive triples. 2) Benchmarking datasets contain anomalies (unrealistic redundancy) that allegedly simplifies link prediction; however, current instantiations of the generic evaluation protocol do not integrate anomalies, which are just discarded based on a user-defined threshold. 3) Benchmarking datasets have been randomly split, which typically alters the graph topology and results in the training split not resembling the original dataset. 4) A single model is typically kept based on its accuracy over the validation split using a given metric; however, since metrics aggregate ranks into a single value, there may be no significant differences among the ranks produced by several models, which must be all evaluated in the test phase. In this paper, we contribute to the evaluation of link prediction as follows: 1) We propose a variation of the mean rank that considers the number of negative counterparts. 2) We define the anomaly coefficient of a predicate and integrate such coefficient in the protocol. 3) We propose a downscaling algorithm to generate training splits that reflect the original graph topology based on a nonparametric, unpaired statistical test. 4) During validation, we discard a learned model only if its output ranks are significantly different than other ranks based on a nonparametric, paired statistical test. Our experiments over seven well-known datasets show that translation-based methods (TransD, TransE and TransH) significantly outperform recent methods, which entails that our understanding of the accuracy of completion methods for link prediction is far from perfect.

References

Farahnaz Akrami, Mohammed Samiul Saeef, Qingheng Zhang, Wei Hu, and Chengkai Li. 2020. Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study. In SIGMOD. 1995–2010.Google Scholar
Mehdi Ali, Max Berrendorf, Charles Tapley Hoyt, Laurent Vermue, Mikhail Galkin, Sahand Sharifzadeh, Asja Fischer, Volker Tresp, and Jens Lehmann. 2020. Bringing Light Into the Dark: A Large-scale Evaluation of Knowledge Graph Embedding Models Under a Unified Framework. CoRR abs/2006.13365(2020).Google Scholar
Mehdi Ali, Max Berrendorf, Charles Tapley Hoyt, Laurent Vermue, Sahand Sharifzadeh, Volker Tresp, and Jens Lehmann. 2020. PyKEEN 1.0: A Python Library for Training and Evaluating Knowledge Graph Embeddings. CoRR abs/2007.14175(2020).Google Scholar
Daniel Ayala, Agustín Borrego, Inma Hernández, Carlos R. Rivero, and David Ruiz. 2019. AYNEC: All You Need for Evaluating Completion Techniques in Knowledge Graphs. In ESWC. 397–411.Google Scholar
Iti Bansal, Sudhanshu Tiwari, and Carlos R. Rivero. 2020. The Impact of Negative Triple Generation Strategies and Anomalies on Knowledge Graph Completion. In CIKM. 45–54.Google Scholar
Max Berrendorf, Evgeniy Faerman, Laurent Vermue, and Volker Tresp. 2020. Interpretable and Fair Comparison of Link Prediction or Entity Alignment Methods with Adjusted Mean Rank. CoRR abs/2002.06914(2020).Google Scholar
Kurt D. Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD. 1247–1250.Google Scholar
Antoine Bordes and Evgeniy Gabrilovich. [n.d.]. Constructing and mining web-scale knowledge graphs: KDD 2014 tutorial. In KDD.Google Scholar
Antoine Bordes, Nicolas Usunier, Alberto García-Durán, Jason Weston, and Oksana Yakhnenko. 2013. Translating Embeddings for Modeling Multi-relational Data. In NIPS. 2787–2795.Google Scholar
Agustín Borrego, Daniel Ayala, Inma Hernández, Carlos R. Rivero, and David Ruiz. 2019. Generating Rules to Filter Candidate Triples for their Correctness Checking by Knowledge Graph Completion Techniques. In K-CAP. 115–122.Google Scholar
Rajarshi Das, Shehzaad Dhuliawala, Manzil Zaheer, Luke Vilnis, Ishan Durugkar, Akshay Krishnamurthy, Alex Smola, and Andrew McCallum. 2018. Go for a Walk and Arrive at the Answer: Reasoning Over Paths in Knowledge Bases using Reinforcement Learning. In ICLR.Google Scholar
Tim Dettmers, Pasquale Minervini, Pontus Stenetorp, and Sebastian Riedel. 2018. Convolutional 2D Knowledge Graph Embeddings. In AAAI. 1811–1818.Google Scholar
Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. 2014. Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In KDD. 601–610.Google Scholar
Xin Luna Dong. 2019. Building a Broad Knowledge Graph for Products. In ICDE. 25.Google Scholar
Norbert Fuhr. 2017. Some Common Mistakes In IR Evaluation, And How They Can Be Avoided. SIGIR Forum 51, 3 (2017), 32–41.Google ScholarDigital Library
Luis Galárraga, Christina Teflioudi, Katja Hose, and Fabian M. Suchanek. 2015. Fast rule mining in ontological knowledge bases with AMIE+. VLDBJ 24, 6 (2015), 707–730.Google ScholarDigital Library
Xu Han, Shulin Cao, Xin Lv, Yankai Lin, Zhiyuan Liu, Maosong Sun, and Juanzi Li. 2018. OpenKE: An Open Toolkit for Knowledge Embedding. In EMNLP. 139–144.Google Scholar
Aidan Hogan, Eva Blomqvist, Michael Cochez, Claudia d’Amato, Gerard de Melo, Claudio Gutierrez, José Emilio Labra Gayo, Sabrina Kirrane, Sebastian Neumaier, Axel Polleres, Roberto Navigli, Axel-Cyrille Ngonga Ngomo, Sabbir M. Rashid, Anisa Rula, Lukas Schmelzeisen, Juan F. Sequeda, Steffen Staab, and Antoine Zimmermann. 2020. Knowledge Graphs. CoRR abs/2003.02320(2020).Google Scholar
Viet-Phi Huynh and Paolo Papotti. 2019. A Benchmark for Fact Checking Algorithms Built on Knowledge Bases. In CIKM. 689–698.Google Scholar
Prachi Jain, Sushant Rathi, Mausam, and Soumen Chakrabarti. 2020. Knowledge Base Completion: Baseline strikes back (Again). CoRR abs/2005.00804(2020).Google Scholar
Guoliang Ji, Shizhu He, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Knowledge Graph Embedding via Dynamic Mapping Matrix. In ACL. 687–696.Google Scholar
Rudolf Kadlec, Ondrej Bajgar, and Jan Kleindienst. 2017. Knowledge Base Completion: Baselines Strike Back. In ACL Workshops. 69–74.Google Scholar
Seyed Mehran Kazemi and David Poole. 2018. SimplE Embedding for Link Prediction in Knowledge Graphs. In NeurIPS. 4289–4300.Google Scholar
Jure Leskovec and Christos Faloutsos. 2006. Sampling from large graphs. In SIGKDD. 631–636.Google Scholar
Xueling Lin, Haoyang Li, Hao Xin, Zijian Li, and Lei Chen. 2020. KBPearl: A Knowledge Base Population System Supported by Joint Entity and Relation Linking. PVLDB 13, 7 (2020), 1035–1049.Google ScholarDigital Library
Hanxiao Liu, Yuexin Wu, and Yiming Yang. 2017. Analogical Inference for Multi-relational Embeddings. In ICML. 2168–2178.Google Scholar
Christian Meilicke, Manuel Fink, Yanjie Wang, Daniel Ruffinelli, Rainer Gemulla, and Heiner Stuckenschmidt. 2018. Fine-Grained Evaluation of Rule- and Embedding-Based Systems for Knowledge Graph Completion. In ISWC. 3–20.Google Scholar
George A. Miller. 1995. WordNet: A Lexical Database for English. CACM 38, 11 (1995), 39–41.Google ScholarDigital Library
Tom M. Mitchell, William W. Cohen, Estevam R. Hruschka Jr., Partha P. Talukdar, Bo Yang, Justin Betteridge, Andrew Carlson, Bhavana Dalvi Mishra, Matt Gardner, Bryan Kisiel, Jayant Krishnamurthy, Ni Lao, Kathryn Mazaitis, Thahir Mohamed, Ndapandula Nakashole, Emmanouil A. Platanios, Alan Ritter, Mehdi Samadi, Burr Settles, Richard C. Wang, Derry Wijaya, Abhinav Gupta, Xinlei Chen, Abulhair Saparov, Malcolm Greaves, and Joel Welling. 2018. Never-ending learning. CACM 61, 5 (2018), 103–115.Google ScholarDigital Library
Sameh K. Mohamed, Vít Novácek, Pierre-Yves Vandenbussche, and Emir Muñoz. 2019. Loss Functions in Knowledge Graph Embedding Models. In ESWC Workshops. 1–10.Google Scholar
Maximilian Nickel, Lorenzo Rosasco, and Tomaso A. Poggio. 2016. Holographic Embeddings of Knowledge Graphs. In AAAI, Dale Schuurmans and Michael P. Wellman (Eds.). 1955–1961.Google Scholar
Natalya Fridman Noy, Yuqing Gao, Anshu Jain, Anant Narayanan, Alan Patterson, and Jamie Taylor. 2019. Industry-scale knowledge graphs: lessons and challenges. CACM 62, 8 (2019), 36–43.Google ScholarDigital Library
Heiko Paulheim. 2017. Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic Web 8, 3 (2017), 489–508.Google ScholarDigital Library
Pouya Pezeshkpour, Yifan Tian, and Sameer Singh. 2020. Revisiting Evaluation of Knowledge Base Completion Models. In AKBC.Google Scholar
John W. Pratt. 1959. Remarks on Zeros and Ties in the Wilcoxon Signed-Rank Procedures. JASA 54, 287 (1959), 655–667.Google ScholarCross Ref
Andrea Rossi, Donatella Firmani, Antonio Matinata, Paolo Merialdo, and Denilson Barbosa. 2020. Knowledge Graph Embedding for Link Prediction: A Comparative Analysis. CoRR abs/2002.00819(2020).Google Scholar
Andrea Rossi and Antonio Matinata. 2020. Knowledge Graph Embeddings: Are Relation-Learning Models Learning Relations?. In EDBT Workshops(CEUR Workshop Proceedings, Vol. 2578).Google Scholar
Daniel Ruffinelli, Samuel Broscheit, and Rainer Gemulla. 2020. You CAN Teach an Old Dog New Tricks! On Training Knowledge Graph Embeddings. In ICLR.Google Scholar
Tara Safavi and Danai Koutra. 2020. CoDEx: A Comprehensive Knowledge Graph Completion Benchmark. CoRR abs/2009.07810(2020).Google Scholar
Tara Safavi, Danai Koutra, and Edgar Meij. 2020. Improving the Utility of Knowledge Graph Embeddings with Calibration. CoRR abs/2004.01168(2020).Google Scholar
Richard Simard and Pierre L’Ecuyer. 2011. Computing the two-sided Kolmogorov-Smirnov distribution. JSS 39, 11 (2011), 1–18.Google ScholarCross Ref
Richard Socher, Danqi Chen, Christopher D. Manning, and Andrew Y. Ng. 2013. Reasoning With Neural Tensor Networks for Knowledge Base Completion. In NIPS. 926–934.Google Scholar
Marina Speranskaya, Martin Schmitt, and Benjamin Roth. 2020. Ranking vs. Classifying: Measuring Knowledge Base Completion Quality. In AKBC.Google Scholar
S. S. Stevens. 1946. On the Theory of Scales of Measurement. Science 103, 2684 (1946), 677–680.Google Scholar
Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, and Jian Tang. 2019. RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space. In ICLR.Google Scholar
Zhiqing Sun, Shikhar Vashishth, Soumya Sanyal, Partha P. Talukdar, and Yiming Yang. 2020. A Re-evaluation of Knowledge Graph Completion Methods. In ACL. 5516–5522.Google Scholar
Pedro Tabacof and Luca Costabello. 2020. Probability Calibration for Knowledge Graph Embedding Models. In ICLR.Google Scholar
Kristina Toutanova and Danqi Chen. 2015. Observed versus latent features for knowledge base and text inference. In ACL Workshops. 57–66.Google ScholarCross Ref
Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard. 2016. Complex Embeddings for Simple Link Prediction. In ICML. 2071–2080.Google Scholar
Yanjie Wang, Daniel Ruffinelli, Rainer Gemulla, Samuel Broscheit, and Christian Meilicke. 2019. On Evaluating Embedding Models for Knowledge Base Completion. In ACL Workshops. 104–112.Google Scholar
Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014. Knowledge Graph Embedding by Translating on Hyperplanes. In AAAI. 1112–1119.Google Scholar
Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. 2015. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. In ICLR.Google Scholar

Recommendations

Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

In the active research area of employing embedding models for knowledge graph completion, particularly for the task of link prediction, most prior studies used two benchmark datasets FB15k and WN18 in evaluating such models. Most triples in these and ...
Read More
The Impact of Negative Triple Generation Strategies and Anomalies on Knowledge Graph Completion
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management

Even though knowledge graphs have proven very useful for several tasks, they are marked by incompleteness. Completion algorithms aim to extend knowledge graphs by predicting missing (subject, predicate, object) triples, usually by training a model to ...
Read More
A new topological metric for link prediction in directed, weighted and temporal networks
ASONAM '16: Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

One of the most interesting tasks in social network analysis is link prediction. There are a lot of studies dealing with link prediction task in the literature. In recent years, there is an increasing on link prediction methods trying to model network ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '21: Proceedings of the Web Conference 2021
April 2021
4054 pages
ISBN:9781450383127
DOI:10.1145/3442381
Editors:
Jure Leskovec
Stanford
,
Marko Grobelnik
Jožef Stefan Institute
,
Marc Najork
Google
,
Jie Tang
Tsinghua University
,
Leila Zia
Wikimedia Foundation
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 June 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Benchmarking
Data anomalies
Evaluation protocol
Knowledge graph completion
Link prediction
Metrics
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 546
  Total Downloads
- Downloads (Last 12 months)82
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Revisiting the Evaluation Protocol of Knowledge Graph Completion Methods for Link Prediction

WWW '21: Proceedings of the Web Conference 2021

ABSTRACT

References

Cited By

Recommendations

Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study

The Impact of Negative Triple Generation Strategies and Anomalies on Knowledge Graph Completion

A new topological metric for link prediction in directed, weighted and temporal networks