ABSTRACT
Completion methods learn models to infer missing (subject, predicate, object) triples in knowledge graphs, a task known as link prediction. The training phase is based on samples of positive triples and their negative counterparts. The test phase consists of ranking each positive triple with respect to its negative counterparts based on the scores obtained by a learned model. The best model ranks all positive triples first. Metrics like mean rank, mean reciprocal rank and hits at k are used to assess accuracy. Under this generic evaluation protocol, we observe several shortcomings: 1) Current metrics assume that each measurement is upper bounded by the same constant value and, therefore, are oblivious to the fact that, in link prediction, each positive triple may have a different number of negative counterparts, which alters the difficulty of ranking positive triples. 2) Benchmarking datasets contain anomalies (unrealistic redundancy) that allegedly simplifies link prediction; however, current instantiations of the generic evaluation protocol do not integrate anomalies, which are just discarded based on a user-defined threshold. 3) Benchmarking datasets have been randomly split, which typically alters the graph topology and results in the training split not resembling the original dataset. 4) A single model is typically kept based on its accuracy over the validation split using a given metric; however, since metrics aggregate ranks into a single value, there may be no significant differences among the ranks produced by several models, which must be all evaluated in the test phase. In this paper, we contribute to the evaluation of link prediction as follows: 1) We propose a variation of the mean rank that considers the number of negative counterparts. 2) We define the anomaly coefficient of a predicate and integrate such coefficient in the protocol. 3) We propose a downscaling algorithm to generate training splits that reflect the original graph topology based on a nonparametric, unpaired statistical test. 4) During validation, we discard a learned model only if its output ranks are significantly different than other ranks based on a nonparametric, paired statistical test. Our experiments over seven well-known datasets show that translation-based methods (TransD, TransE and TransH) significantly outperform recent methods, which entails that our understanding of the accuracy of completion methods for link prediction is far from perfect.
- Farahnaz Akrami, Mohammed Samiul Saeef, Qingheng Zhang, Wei Hu, and Chengkai Li. 2020. Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study. In SIGMOD. 1995–2010.Google Scholar
- Mehdi Ali, Max Berrendorf, Charles Tapley Hoyt, Laurent Vermue, Mikhail Galkin, Sahand Sharifzadeh, Asja Fischer, Volker Tresp, and Jens Lehmann. 2020. Bringing Light Into the Dark: A Large-scale Evaluation of Knowledge Graph Embedding Models Under a Unified Framework. CoRR abs/2006.13365(2020).Google Scholar
- Mehdi Ali, Max Berrendorf, Charles Tapley Hoyt, Laurent Vermue, Sahand Sharifzadeh, Volker Tresp, and Jens Lehmann. 2020. PyKEEN 1.0: A Python Library for Training and Evaluating Knowledge Graph Embeddings. CoRR abs/2007.14175(2020).Google Scholar
- Daniel Ayala, Agustín Borrego, Inma Hernández, Carlos R. Rivero, and David Ruiz. 2019. AYNEC: All You Need for Evaluating Completion Techniques in Knowledge Graphs. In ESWC. 397–411.Google Scholar
- Iti Bansal, Sudhanshu Tiwari, and Carlos R. Rivero. 2020. The Impact of Negative Triple Generation Strategies and Anomalies on Knowledge Graph Completion. In CIKM. 45–54.Google Scholar
- Max Berrendorf, Evgeniy Faerman, Laurent Vermue, and Volker Tresp. 2020. Interpretable and Fair Comparison of Link Prediction or Entity Alignment Methods with Adjusted Mean Rank. CoRR abs/2002.06914(2020).Google Scholar
- Kurt D. Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD. 1247–1250.Google Scholar
- Antoine Bordes and Evgeniy Gabrilovich. [n.d.]. Constructing and mining web-scale knowledge graphs: KDD 2014 tutorial. In KDD.Google Scholar
- Antoine Bordes, Nicolas Usunier, Alberto García-Durán, Jason Weston, and Oksana Yakhnenko. 2013. Translating Embeddings for Modeling Multi-relational Data. In NIPS. 2787–2795.Google Scholar
- Agustín Borrego, Daniel Ayala, Inma Hernández, Carlos R. Rivero, and David Ruiz. 2019. Generating Rules to Filter Candidate Triples for their Correctness Checking by Knowledge Graph Completion Techniques. In K-CAP. 115–122.Google Scholar
- Rajarshi Das, Shehzaad Dhuliawala, Manzil Zaheer, Luke Vilnis, Ishan Durugkar, Akshay Krishnamurthy, Alex Smola, and Andrew McCallum. 2018. Go for a Walk and Arrive at the Answer: Reasoning Over Paths in Knowledge Bases using Reinforcement Learning. In ICLR.Google Scholar
- Tim Dettmers, Pasquale Minervini, Pontus Stenetorp, and Sebastian Riedel. 2018. Convolutional 2D Knowledge Graph Embeddings. In AAAI. 1811–1818.Google Scholar
- Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. 2014. Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In KDD. 601–610.Google Scholar
- Xin Luna Dong. 2019. Building a Broad Knowledge Graph for Products. In ICDE. 25.Google Scholar
- Norbert Fuhr. 2017. Some Common Mistakes In IR Evaluation, And How They Can Be Avoided. SIGIR Forum 51, 3 (2017), 32–41.Google ScholarDigital Library
- Luis Galárraga, Christina Teflioudi, Katja Hose, and Fabian M. Suchanek. 2015. Fast rule mining in ontological knowledge bases with AMIE+. VLDBJ 24, 6 (2015), 707–730.Google ScholarDigital Library
- Xu Han, Shulin Cao, Xin Lv, Yankai Lin, Zhiyuan Liu, Maosong Sun, and Juanzi Li. 2018. OpenKE: An Open Toolkit for Knowledge Embedding. In EMNLP. 139–144.Google Scholar
- Aidan Hogan, Eva Blomqvist, Michael Cochez, Claudia d’Amato, Gerard de Melo, Claudio Gutierrez, José Emilio Labra Gayo, Sabrina Kirrane, Sebastian Neumaier, Axel Polleres, Roberto Navigli, Axel-Cyrille Ngonga Ngomo, Sabbir M. Rashid, Anisa Rula, Lukas Schmelzeisen, Juan F. Sequeda, Steffen Staab, and Antoine Zimmermann. 2020. Knowledge Graphs. CoRR abs/2003.02320(2020).Google Scholar
- Viet-Phi Huynh and Paolo Papotti. 2019. A Benchmark for Fact Checking Algorithms Built on Knowledge Bases. In CIKM. 689–698.Google Scholar
- Prachi Jain, Sushant Rathi, Mausam, and Soumen Chakrabarti. 2020. Knowledge Base Completion: Baseline strikes back (Again). CoRR abs/2005.00804(2020).Google Scholar
- Guoliang Ji, Shizhu He, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Knowledge Graph Embedding via Dynamic Mapping Matrix. In ACL. 687–696.Google Scholar
- Rudolf Kadlec, Ondrej Bajgar, and Jan Kleindienst. 2017. Knowledge Base Completion: Baselines Strike Back. In ACL Workshops. 69–74.Google Scholar
- Seyed Mehran Kazemi and David Poole. 2018. SimplE Embedding for Link Prediction in Knowledge Graphs. In NeurIPS. 4289–4300.Google Scholar
- Jure Leskovec and Christos Faloutsos. 2006. Sampling from large graphs. In SIGKDD. 631–636.Google Scholar
- Xueling Lin, Haoyang Li, Hao Xin, Zijian Li, and Lei Chen. 2020. KBPearl: A Knowledge Base Population System Supported by Joint Entity and Relation Linking. PVLDB 13, 7 (2020), 1035–1049.Google ScholarDigital Library
- Hanxiao Liu, Yuexin Wu, and Yiming Yang. 2017. Analogical Inference for Multi-relational Embeddings. In ICML. 2168–2178.Google Scholar
- Christian Meilicke, Manuel Fink, Yanjie Wang, Daniel Ruffinelli, Rainer Gemulla, and Heiner Stuckenschmidt. 2018. Fine-Grained Evaluation of Rule- and Embedding-Based Systems for Knowledge Graph Completion. In ISWC. 3–20.Google Scholar
- George A. Miller. 1995. WordNet: A Lexical Database for English. CACM 38, 11 (1995), 39–41.Google ScholarDigital Library
- Tom M. Mitchell, William W. Cohen, Estevam R. Hruschka Jr., Partha P. Talukdar, Bo Yang, Justin Betteridge, Andrew Carlson, Bhavana Dalvi Mishra, Matt Gardner, Bryan Kisiel, Jayant Krishnamurthy, Ni Lao, Kathryn Mazaitis, Thahir Mohamed, Ndapandula Nakashole, Emmanouil A. Platanios, Alan Ritter, Mehdi Samadi, Burr Settles, Richard C. Wang, Derry Wijaya, Abhinav Gupta, Xinlei Chen, Abulhair Saparov, Malcolm Greaves, and Joel Welling. 2018. Never-ending learning. CACM 61, 5 (2018), 103–115.Google ScholarDigital Library
- Sameh K. Mohamed, Vít Novácek, Pierre-Yves Vandenbussche, and Emir Muñoz. 2019. Loss Functions in Knowledge Graph Embedding Models. In ESWC Workshops. 1–10.Google Scholar
- Maximilian Nickel, Lorenzo Rosasco, and Tomaso A. Poggio. 2016. Holographic Embeddings of Knowledge Graphs. In AAAI, Dale Schuurmans and Michael P. Wellman (Eds.). 1955–1961.Google Scholar
- Natalya Fridman Noy, Yuqing Gao, Anshu Jain, Anant Narayanan, Alan Patterson, and Jamie Taylor. 2019. Industry-scale knowledge graphs: lessons and challenges. CACM 62, 8 (2019), 36–43.Google ScholarDigital Library
- Heiko Paulheim. 2017. Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic Web 8, 3 (2017), 489–508.Google ScholarDigital Library
- Pouya Pezeshkpour, Yifan Tian, and Sameer Singh. 2020. Revisiting Evaluation of Knowledge Base Completion Models. In AKBC.Google Scholar
- John W. Pratt. 1959. Remarks on Zeros and Ties in the Wilcoxon Signed-Rank Procedures. JASA 54, 287 (1959), 655–667.Google ScholarCross Ref
- Andrea Rossi, Donatella Firmani, Antonio Matinata, Paolo Merialdo, and Denilson Barbosa. 2020. Knowledge Graph Embedding for Link Prediction: A Comparative Analysis. CoRR abs/2002.00819(2020).Google Scholar
- Andrea Rossi and Antonio Matinata. 2020. Knowledge Graph Embeddings: Are Relation-Learning Models Learning Relations?. In EDBT Workshops(CEUR Workshop Proceedings, Vol. 2578).Google Scholar
- Daniel Ruffinelli, Samuel Broscheit, and Rainer Gemulla. 2020. You CAN Teach an Old Dog New Tricks! On Training Knowledge Graph Embeddings. In ICLR.Google Scholar
- Tara Safavi and Danai Koutra. 2020. CoDEx: A Comprehensive Knowledge Graph Completion Benchmark. CoRR abs/2009.07810(2020).Google Scholar
- Tara Safavi, Danai Koutra, and Edgar Meij. 2020. Improving the Utility of Knowledge Graph Embeddings with Calibration. CoRR abs/2004.01168(2020).Google Scholar
- Richard Simard and Pierre L’Ecuyer. 2011. Computing the two-sided Kolmogorov-Smirnov distribution. JSS 39, 11 (2011), 1–18.Google ScholarCross Ref
- Richard Socher, Danqi Chen, Christopher D. Manning, and Andrew Y. Ng. 2013. Reasoning With Neural Tensor Networks for Knowledge Base Completion. In NIPS. 926–934.Google Scholar
- Marina Speranskaya, Martin Schmitt, and Benjamin Roth. 2020. Ranking vs. Classifying: Measuring Knowledge Base Completion Quality. In AKBC.Google Scholar
- S. S. Stevens. 1946. On the Theory of Scales of Measurement. Science 103, 2684 (1946), 677–680.Google Scholar
- Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, and Jian Tang. 2019. RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space. In ICLR.Google Scholar
- Zhiqing Sun, Shikhar Vashishth, Soumya Sanyal, Partha P. Talukdar, and Yiming Yang. 2020. A Re-evaluation of Knowledge Graph Completion Methods. In ACL. 5516–5522.Google Scholar
- Pedro Tabacof and Luca Costabello. 2020. Probability Calibration for Knowledge Graph Embedding Models. In ICLR.Google Scholar
- Kristina Toutanova and Danqi Chen. 2015. Observed versus latent features for knowledge base and text inference. In ACL Workshops. 57–66.Google ScholarCross Ref
- Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard. 2016. Complex Embeddings for Simple Link Prediction. In ICML. 2071–2080.Google Scholar
- Yanjie Wang, Daniel Ruffinelli, Rainer Gemulla, Samuel Broscheit, and Christian Meilicke. 2019. On Evaluating Embedding Models for Knowledge Base Completion. In ACL Workshops. 104–112.Google Scholar
- Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014. Knowledge Graph Embedding by Translating on Hyperplanes. In AAAI. 1112–1119.Google Scholar
- Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. 2015. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. In ICLR.Google Scholar
Recommendations
Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of DataIn the active research area of employing embedding models for knowledge graph completion, particularly for the task of link prediction, most prior studies used two benchmark datasets FB15k and WN18 in evaluating such models. Most triples in these and ...
The Impact of Negative Triple Generation Strategies and Anomalies on Knowledge Graph Completion
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge ManagementEven though knowledge graphs have proven very useful for several tasks, they are marked by incompleteness. Completion algorithms aim to extend knowledge graphs by predicting missing (subject, predicate, object) triples, usually by training a model to ...
A new topological metric for link prediction in directed, weighted and temporal networks
ASONAM '16: Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and MiningOne of the most interesting tasks in social network analysis is link prediction. There are a lot of studies dealing with link prediction task in the literature. In recent years, there is an increasing on link prediction methods trying to model network ...
Comments