skip to main content
10.1145/3442381.3449856acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Revisiting the Evaluation Protocol of Knowledge Graph Completion Methods for Link Prediction

Published:03 June 2021Publication History

ABSTRACT

Completion methods learn models to infer missing (subject, predicate, object) triples in knowledge graphs, a task known as link prediction. The training phase is based on samples of positive triples and their negative counterparts. The test phase consists of ranking each positive triple with respect to its negative counterparts based on the scores obtained by a learned model. The best model ranks all positive triples first. Metrics like mean rank, mean reciprocal rank and hits at k are used to assess accuracy. Under this generic evaluation protocol, we observe several shortcomings: 1) Current metrics assume that each measurement is upper bounded by the same constant value and, therefore, are oblivious to the fact that, in link prediction, each positive triple may have a different number of negative counterparts, which alters the difficulty of ranking positive triples. 2) Benchmarking datasets contain anomalies (unrealistic redundancy) that allegedly simplifies link prediction; however, current instantiations of the generic evaluation protocol do not integrate anomalies, which are just discarded based on a user-defined threshold. 3) Benchmarking datasets have been randomly split, which typically alters the graph topology and results in the training split not resembling the original dataset. 4) A single model is typically kept based on its accuracy over the validation split using a given metric; however, since metrics aggregate ranks into a single value, there may be no significant differences among the ranks produced by several models, which must be all evaluated in the test phase. In this paper, we contribute to the evaluation of link prediction as follows: 1) We propose a variation of the mean rank that considers the number of negative counterparts. 2) We define the anomaly coefficient of a predicate and integrate such coefficient in the protocol. 3) We propose a downscaling algorithm to generate training splits that reflect the original graph topology based on a nonparametric, unpaired statistical test. 4) During validation, we discard a learned model only if its output ranks are significantly different than other ranks based on a nonparametric, paired statistical test. Our experiments over seven well-known datasets show that translation-based methods (TransD, TransE and TransH) significantly outperform recent methods, which entails that our understanding of the accuracy of completion methods for link prediction is far from perfect.

References

  1. Farahnaz Akrami, Mohammed Samiul Saeef, Qingheng Zhang, Wei Hu, and Chengkai Li. 2020. Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study. In SIGMOD. 1995–2010.Google ScholarGoogle Scholar
  2. Mehdi Ali, Max Berrendorf, Charles Tapley Hoyt, Laurent Vermue, Mikhail Galkin, Sahand Sharifzadeh, Asja Fischer, Volker Tresp, and Jens Lehmann. 2020. Bringing Light Into the Dark: A Large-scale Evaluation of Knowledge Graph Embedding Models Under a Unified Framework. CoRR abs/2006.13365(2020).Google ScholarGoogle Scholar
  3. Mehdi Ali, Max Berrendorf, Charles Tapley Hoyt, Laurent Vermue, Sahand Sharifzadeh, Volker Tresp, and Jens Lehmann. 2020. PyKEEN 1.0: A Python Library for Training and Evaluating Knowledge Graph Embeddings. CoRR abs/2007.14175(2020).Google ScholarGoogle Scholar
  4. Daniel Ayala, Agustín Borrego, Inma Hernández, Carlos R. Rivero, and David Ruiz. 2019. AYNEC: All You Need for Evaluating Completion Techniques in Knowledge Graphs. In ESWC. 397–411.Google ScholarGoogle Scholar
  5. Iti Bansal, Sudhanshu Tiwari, and Carlos R. Rivero. 2020. The Impact of Negative Triple Generation Strategies and Anomalies on Knowledge Graph Completion. In CIKM. 45–54.Google ScholarGoogle Scholar
  6. Max Berrendorf, Evgeniy Faerman, Laurent Vermue, and Volker Tresp. 2020. Interpretable and Fair Comparison of Link Prediction or Entity Alignment Methods with Adjusted Mean Rank. CoRR abs/2002.06914(2020).Google ScholarGoogle Scholar
  7. Kurt D. Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD. 1247–1250.Google ScholarGoogle Scholar
  8. Antoine Bordes and Evgeniy Gabrilovich. [n.d.]. Constructing and mining web-scale knowledge graphs: KDD 2014 tutorial. In KDD.Google ScholarGoogle Scholar
  9. Antoine Bordes, Nicolas Usunier, Alberto García-Durán, Jason Weston, and Oksana Yakhnenko. 2013. Translating Embeddings for Modeling Multi-relational Data. In NIPS. 2787–2795.Google ScholarGoogle Scholar
  10. Agustín Borrego, Daniel Ayala, Inma Hernández, Carlos R. Rivero, and David Ruiz. 2019. Generating Rules to Filter Candidate Triples for their Correctness Checking by Knowledge Graph Completion Techniques. In K-CAP. 115–122.Google ScholarGoogle Scholar
  11. Rajarshi Das, Shehzaad Dhuliawala, Manzil Zaheer, Luke Vilnis, Ishan Durugkar, Akshay Krishnamurthy, Alex Smola, and Andrew McCallum. 2018. Go for a Walk and Arrive at the Answer: Reasoning Over Paths in Knowledge Bases using Reinforcement Learning. In ICLR.Google ScholarGoogle Scholar
  12. Tim Dettmers, Pasquale Minervini, Pontus Stenetorp, and Sebastian Riedel. 2018. Convolutional 2D Knowledge Graph Embeddings. In AAAI. 1811–1818.Google ScholarGoogle Scholar
  13. Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. 2014. Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In KDD. 601–610.Google ScholarGoogle Scholar
  14. Xin Luna Dong. 2019. Building a Broad Knowledge Graph for Products. In ICDE. 25.Google ScholarGoogle Scholar
  15. Norbert Fuhr. 2017. Some Common Mistakes In IR Evaluation, And How They Can Be Avoided. SIGIR Forum 51, 3 (2017), 32–41.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Luis Galárraga, Christina Teflioudi, Katja Hose, and Fabian M. Suchanek. 2015. Fast rule mining in ontological knowledge bases with AMIE+. VLDBJ 24, 6 (2015), 707–730.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Xu Han, Shulin Cao, Xin Lv, Yankai Lin, Zhiyuan Liu, Maosong Sun, and Juanzi Li. 2018. OpenKE: An Open Toolkit for Knowledge Embedding. In EMNLP. 139–144.Google ScholarGoogle Scholar
  18. Aidan Hogan, Eva Blomqvist, Michael Cochez, Claudia d’Amato, Gerard de Melo, Claudio Gutierrez, José Emilio Labra Gayo, Sabrina Kirrane, Sebastian Neumaier, Axel Polleres, Roberto Navigli, Axel-Cyrille Ngonga Ngomo, Sabbir M. Rashid, Anisa Rula, Lukas Schmelzeisen, Juan F. Sequeda, Steffen Staab, and Antoine Zimmermann. 2020. Knowledge Graphs. CoRR abs/2003.02320(2020).Google ScholarGoogle Scholar
  19. Viet-Phi Huynh and Paolo Papotti. 2019. A Benchmark for Fact Checking Algorithms Built on Knowledge Bases. In CIKM. 689–698.Google ScholarGoogle Scholar
  20. Prachi Jain, Sushant Rathi, Mausam, and Soumen Chakrabarti. 2020. Knowledge Base Completion: Baseline strikes back (Again). CoRR abs/2005.00804(2020).Google ScholarGoogle Scholar
  21. Guoliang Ji, Shizhu He, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Knowledge Graph Embedding via Dynamic Mapping Matrix. In ACL. 687–696.Google ScholarGoogle Scholar
  22. Rudolf Kadlec, Ondrej Bajgar, and Jan Kleindienst. 2017. Knowledge Base Completion: Baselines Strike Back. In ACL Workshops. 69–74.Google ScholarGoogle Scholar
  23. Seyed Mehran Kazemi and David Poole. 2018. SimplE Embedding for Link Prediction in Knowledge Graphs. In NeurIPS. 4289–4300.Google ScholarGoogle Scholar
  24. Jure Leskovec and Christos Faloutsos. 2006. Sampling from large graphs. In SIGKDD. 631–636.Google ScholarGoogle Scholar
  25. Xueling Lin, Haoyang Li, Hao Xin, Zijian Li, and Lei Chen. 2020. KBPearl: A Knowledge Base Population System Supported by Joint Entity and Relation Linking. PVLDB 13, 7 (2020), 1035–1049.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Hanxiao Liu, Yuexin Wu, and Yiming Yang. 2017. Analogical Inference for Multi-relational Embeddings. In ICML. 2168–2178.Google ScholarGoogle Scholar
  27. Christian Meilicke, Manuel Fink, Yanjie Wang, Daniel Ruffinelli, Rainer Gemulla, and Heiner Stuckenschmidt. 2018. Fine-Grained Evaluation of Rule- and Embedding-Based Systems for Knowledge Graph Completion. In ISWC. 3–20.Google ScholarGoogle Scholar
  28. George A. Miller. 1995. WordNet: A Lexical Database for English. CACM 38, 11 (1995), 39–41.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Tom M. Mitchell, William W. Cohen, Estevam R. Hruschka Jr., Partha P. Talukdar, Bo Yang, Justin Betteridge, Andrew Carlson, Bhavana Dalvi Mishra, Matt Gardner, Bryan Kisiel, Jayant Krishnamurthy, Ni Lao, Kathryn Mazaitis, Thahir Mohamed, Ndapandula Nakashole, Emmanouil A. Platanios, Alan Ritter, Mehdi Samadi, Burr Settles, Richard C. Wang, Derry Wijaya, Abhinav Gupta, Xinlei Chen, Abulhair Saparov, Malcolm Greaves, and Joel Welling. 2018. Never-ending learning. CACM 61, 5 (2018), 103–115.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Sameh K. Mohamed, Vít Novácek, Pierre-Yves Vandenbussche, and Emir Muñoz. 2019. Loss Functions in Knowledge Graph Embedding Models. In ESWC Workshops. 1–10.Google ScholarGoogle Scholar
  31. Maximilian Nickel, Lorenzo Rosasco, and Tomaso A. Poggio. 2016. Holographic Embeddings of Knowledge Graphs. In AAAI, Dale Schuurmans and Michael P. Wellman (Eds.). 1955–1961.Google ScholarGoogle Scholar
  32. Natalya Fridman Noy, Yuqing Gao, Anshu Jain, Anant Narayanan, Alan Patterson, and Jamie Taylor. 2019. Industry-scale knowledge graphs: lessons and challenges. CACM 62, 8 (2019), 36–43.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Heiko Paulheim. 2017. Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic Web 8, 3 (2017), 489–508.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Pouya Pezeshkpour, Yifan Tian, and Sameer Singh. 2020. Revisiting Evaluation of Knowledge Base Completion Models. In AKBC.Google ScholarGoogle Scholar
  35. John W. Pratt. 1959. Remarks on Zeros and Ties in the Wilcoxon Signed-Rank Procedures. JASA 54, 287 (1959), 655–667.Google ScholarGoogle ScholarCross RefCross Ref
  36. Andrea Rossi, Donatella Firmani, Antonio Matinata, Paolo Merialdo, and Denilson Barbosa. 2020. Knowledge Graph Embedding for Link Prediction: A Comparative Analysis. CoRR abs/2002.00819(2020).Google ScholarGoogle Scholar
  37. Andrea Rossi and Antonio Matinata. 2020. Knowledge Graph Embeddings: Are Relation-Learning Models Learning Relations?. In EDBT Workshops(CEUR Workshop Proceedings, Vol. 2578).Google ScholarGoogle Scholar
  38. Daniel Ruffinelli, Samuel Broscheit, and Rainer Gemulla. 2020. You CAN Teach an Old Dog New Tricks! On Training Knowledge Graph Embeddings. In ICLR.Google ScholarGoogle Scholar
  39. Tara Safavi and Danai Koutra. 2020. CoDEx: A Comprehensive Knowledge Graph Completion Benchmark. CoRR abs/2009.07810(2020).Google ScholarGoogle Scholar
  40. Tara Safavi, Danai Koutra, and Edgar Meij. 2020. Improving the Utility of Knowledge Graph Embeddings with Calibration. CoRR abs/2004.01168(2020).Google ScholarGoogle Scholar
  41. Richard Simard and Pierre L’Ecuyer. 2011. Computing the two-sided Kolmogorov-Smirnov distribution. JSS 39, 11 (2011), 1–18.Google ScholarGoogle ScholarCross RefCross Ref
  42. Richard Socher, Danqi Chen, Christopher D. Manning, and Andrew Y. Ng. 2013. Reasoning With Neural Tensor Networks for Knowledge Base Completion. In NIPS. 926–934.Google ScholarGoogle Scholar
  43. Marina Speranskaya, Martin Schmitt, and Benjamin Roth. 2020. Ranking vs. Classifying: Measuring Knowledge Base Completion Quality. In AKBC.Google ScholarGoogle Scholar
  44. S. S. Stevens. 1946. On the Theory of Scales of Measurement. Science 103, 2684 (1946), 677–680.Google ScholarGoogle Scholar
  45. Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, and Jian Tang. 2019. RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space. In ICLR.Google ScholarGoogle Scholar
  46. Zhiqing Sun, Shikhar Vashishth, Soumya Sanyal, Partha P. Talukdar, and Yiming Yang. 2020. A Re-evaluation of Knowledge Graph Completion Methods. In ACL. 5516–5522.Google ScholarGoogle Scholar
  47. Pedro Tabacof and Luca Costabello. 2020. Probability Calibration for Knowledge Graph Embedding Models. In ICLR.Google ScholarGoogle Scholar
  48. Kristina Toutanova and Danqi Chen. 2015. Observed versus latent features for knowledge base and text inference. In ACL Workshops. 57–66.Google ScholarGoogle ScholarCross RefCross Ref
  49. Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard. 2016. Complex Embeddings for Simple Link Prediction. In ICML. 2071–2080.Google ScholarGoogle Scholar
  50. Yanjie Wang, Daniel Ruffinelli, Rainer Gemulla, Samuel Broscheit, and Christian Meilicke. 2019. On Evaluating Embedding Models for Knowledge Base Completion. In ACL Workshops. 104–112.Google ScholarGoogle Scholar
  51. Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014. Knowledge Graph Embedding by Translating on Hyperplanes. In AAAI. 1112–1119.Google ScholarGoogle Scholar
  52. Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. 2015. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. In ICLR.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    WWW '21: Proceedings of the Web Conference 2021
    April 2021
    4054 pages
    ISBN:9781450383127
    DOI:10.1145/3442381

    Copyright © 2021 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 3 June 2021

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate1,899of8,196submissions,23%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format