skip to main content
10.1145/3459637.3482224acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article
Open Access

A Lightweight Knowledge Graph Embedding Framework for Efficient Inference and Storage

Authors Info & Claims
Published:30 October 2021Publication History

ABSTRACT

Knowledge graphs, which consist of entities and their relations, have become a popular way to store structured knowledge. Knowledge graph embedding (KGE), which derives a representation for each entity and relation, has been widely used to capture the semantics of the information in the knowledge graphs, and has demonstrated great success in many downstream applications, such as the extraction of similar entities in response to a query entity. However, existing KGE methods cannot work well on emerging knowledge graphs that are large-scale due to the constraints in storage and inference efficiency. In this paper, we propose a lightweight KGE model, LightKG, which significantly reduces storage as well as running time needed for inference. Instead of storing a continuous vector for every entity, LightKG only needs to store a few codebooks, each of which contains some codewords that correspond to the representatives among the embeddings, and the indices that correspond to the codeword selections for entities. Hence LightKG can achieve highly efficient storage. The efficiency of the downstream querying process can be significantly boosted too with the proposed LightKG model as the relevance score between the query and an entity can be efficiently calculated via a quick look-up in a table that contains the scores between the query and codewords. The storage and inference efficiency of LightKG is achieved by its novel design. LightKG is an end-to-end framework that automatically infers codebooks and codewords and generates an approximated embedding for each entity. A residual module is included in LightKG to induce the diversity among codebooks, and a continuous function is adopted to approximate codeword selection, which is non-differential. In addition, to further improve the performance of KGE, we propose a novel dynamic negative sampling method based on quantization, which can be applied to the proposed LightKG or other KGE methods. We conduct extensive experiments on five public datasets. The experiments show that LightKG is search and memory efficient with high approximate search accuracy. Also, the dynamic negative sampling can dramatically improve model performance with over 19% improvement on average.

Skip Supplemental Material Section

Supplemental Material

20210927_012342.mp4

mp4

28.6 MB

References

  1. Artem Babenko and Victor Lempitsky. 2014. The inverted multi-index. IEEE transactions on pattern analysis and machine intelligence, Vol. 37, 6 (2014), 1247--1260.Google ScholarGoogle Scholar
  2. Yebo Bao, Hui Jiang, Lirong Dai, and Cong Liu. [n.d.]. Incoherent training of deep neural networks to de-correlate bottleneck features for speech recognition. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.Google ScholarGoogle Scholar
  3. Yoshua Bengio, Nicholas Léonard, and Aaron Courville. 2013. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013).Google ScholarGoogle Scholar
  4. Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. Advances in neural information processing systems, Vol. 26 (2013), 2787--2795.Google ScholarGoogle Scholar
  5. Liwei Cai and William Yang Wang. 2017. Kbgan: Adversarial learning for knowledge graph embeddings. arXiv preprint arXiv:1711.04071 (2017).Google ScholarGoogle Scholar
  6. Ines Chami, Adva Wolf, Da-Cheng Juan, Frederic Sala, Sujith Ravi, and Christopher Ré. 2020. Low-Dimensional Hyperbolic Knowledge Graph Embeddings. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 6901--6914.Google ScholarGoogle ScholarCross RefCross Ref
  7. Xiaojun Chen, Shengbin Jia, and Yang Xiang. 2020. A review: Knowledge reasoning over knowledge graph. Expert Systems with Applications, Vol. 141 (2020), 112948.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Yongjian Chen, Tao Guan, and Cheng Wang. 2010. Approximate nearest neighbor search by residual vector quantization. Sensors, Vol. 10, 12 (2010), 11259--11273.Google ScholarGoogle ScholarCross RefCross Ref
  9. Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2014. Training deep neural networks with low precision multiplications. arXiv preprint arXiv:1412.7024 (2014).Google ScholarGoogle Scholar
  10. Tim Dettmers, Pasquale Minervini, Pontus Stenetorp, and Sebastian Riedel. 2018. Convolutional 2d knowledge graph embeddings. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.Google ScholarGoogle ScholarCross RefCross Ref
  11. Laura Dietz, Alexander Kotov, and Edgar Meij. 2018. Utilizing knowledge graphs for text-centric information retrieval. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 1387--1390.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun. 2013. Optimized product quantization. IEEE transactions on pattern analysis and machine intelligence, Vol. 36, 4 (2013), 744--755.Google ScholarGoogle Scholar
  13. Aristides Gionis, Piotr Indyk, Rajeev Motwani, et al. 1999. Similarity search in high dimensions via hashing. In Vldb, Vol. 99. 518--529.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Xu Han, Shulin Cao, Xin Lv, Yankai Lin, Zhiyuan Liu, Maosong Sun, and Juanzi Li. 2018. Openke: An open toolkit for knowledge embedding. In Proceedings of the 2018 conference on empirical methods in natural language processing: system demonstrations. 139--144.Google ScholarGoogle ScholarCross RefCross Ref
  15. John A Hartigan and Manchek A Wong. 1979. AK-means clustering algorithm. Journal of the Royal Statistical Society: Series C (Applied Statistics), Vol. 28, 1 (1979), 100--108.Google ScholarGoogle ScholarCross RefCross Ref
  16. Xiao Huang, Jingyuan Zhang, Dingcheng Li, and Ping Li. 2019. Knowledge graph embedding based question answering. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. 105--113.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks. Advances in neural information processing systems, Vol. 29 (2016), 4107--4115.Google ScholarGoogle Scholar
  18. Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2017. Quantized neural networks: Training neural networks with low precision weights and activations. The Journal of Machine Learning Research, Vol. 18, 1 (2017), 6869--6898.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Eric Jang, Shixiang Gu, and Ben Poole. 2016. Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144 (2016).Google ScholarGoogle Scholar
  20. Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2010. Product quantization for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence, Vol. 33, 1 (2010), 117--128.Google ScholarGoogle Scholar
  21. Jin-Hwa Kim, Jaehyun Jun, and Byoung-Tak Zhang. 2018. Bilinear attention networks. arXiv preprint arXiv:1805.07932 (2018).Google ScholarGoogle Scholar
  22. Brian Kulis, Mátyás A Sustik, and Inderjit S Dhillon. 2009. Low-Rank Kernel Learning with Bregman Matrix Divergences. Journal of Machine Learning Research, Vol. 10, 2 (2009).Google ScholarGoogle Scholar
  23. Defu Lian, Qi Liu, and Enhong Chen. 2020a. Personalized ranking with importance sampling. In Proceedings of The Web Conference 2020. 1093--1103.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Defu Lian, Haoyu Wang, Zheng Liu, Jianxun Lian, Enhong Chen, and Xing Xie. 2020b. Lightrec: A memory and search-efficient recommender system. In Proceedings of The Web Conference 2020. 695--705.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Defu Lian, Yongji Wu, Yong Ge, Xing Xie, and Enhong Chen. 2020c. Geography-aware sequential location recommendation. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining. 2009--2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Defu Lian, Xing Xie, Enhong Chen, and Hui Xiong. [n.d.]. Product Quantized Collaborative Filtering. IEEE Transactions on Knowledge and Data Engineering ( [n.,d.]).Google ScholarGoogle Scholar
  27. Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 2015. Learning entity and relation embeddings for knowledge graph completion. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 29.Google ScholarGoogle ScholarCross RefCross Ref
  28. Zhenghao Liu, Chenyan Xiong, Maosong Sun, and Zhiyuan Liu. 2018. Entity-duet neural ranking: Understanding the role of knowledge graph semantics in neural information retrieval. arXiv preprint arXiv:1805.07591 (2018).Google ScholarGoogle Scholar
  29. Chris J Maddison, Andriy Mnih, and Yee Whye Teh. 2016. The concrete distribution: A continuous relaxation of discrete random variables. arXiv preprint arXiv:1611.00712 (2016).Google ScholarGoogle Scholar
  30. Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel. 2011. A three-way model for collective learning on multi-relational data.. In Icml, Vol. 11. 809--816.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. Xnor-net: Imagenet classification using binary convolutional neural networks. In European conference on computer vision. Springer, 525--542.Google ScholarGoogle ScholarCross RefCross Ref
  32. Mrinmaya Sachan. 2020. Knowledge Graph Embedding Compression. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.Google ScholarGoogle ScholarCross RefCross Ref
  33. Shengtian Sang, Zhihao Yang, Lei Wang, Xiaoxia Liu, Hongfei Lin, and Jian Wang. 2018. SemaTyP: a knowledge graph based literature mining method for drug discovery. BMC bioinformatics, Vol. 19, 1 (2018), 193.Google ScholarGoogle Scholar
  34. Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, and Max Welling. 2018. Modeling relational data with graph convolutional networks. In European Semantic Web Conference. Springer, 593--607.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Fabian M Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago: a core of semantic knowledge. In Proceedings of the 16th international conference on World Wide Web. 697--706.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, and Jian Tang. 2019. Rotate: Knowledge graph embedding by relational rotation in complex space. arXiv preprint arXiv:1902.10197 (2019).Google ScholarGoogle Scholar
  37. Kristina Toutanova, Danqi Chen, Patrick Pantel, Hoifung Poon, Pallavi Choudhury, and Michael Gamon. 2015. Representing text for joint embedding of text and knowledge bases. In Proceedings of the 2015 conference on empirical methods in natural language processing. 1499--1509.Google ScholarGoogle ScholarCross RefCross Ref
  38. Théo Trouillon, Christopher R Dance, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard. 2017. Knowledge graph completion via complex tensor factorization. arXiv preprint arXiv:1702.06879 (2017).Google ScholarGoogle Scholar
  39. Hongwei Wang, Fuzheng Zhang, Jialin Wang, Miao Zhao, Wenjie Li, Xing Xie, and Minyi Guo. 2018b. Ripplenet: Propagating user preferences on the knowledge graph for recommender systems. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 417--426.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Peifeng Wang, Shuangyin Li, and Rong Pan. 2018a. Incorporating gan for negative sampling in knowledge representation learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.Google ScholarGoogle ScholarCross RefCross Ref
  41. Yuxiang Wang, Zhangpeng Ge, Haijiang Yan, Xiaoliang Xu, and Yixing Xia. 2019. Semantic locality-based approximate knowledge graph query. Concurrency and Computation: Practice and Experience, Vol. 31, 24 (2019), e5345.Google ScholarGoogle ScholarCross RefCross Ref
  42. Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014. Knowledge graph embedding by translating on hyperplanes. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 28.Google ScholarGoogle ScholarCross RefCross Ref
  43. Pengtao Xie, Yuntian Deng, and Eric Xing. 2015. Diversifying restricted boltzmann machine for document modeling. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1315--1324.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Pengtao Xie, Aarti Singh, and Eric P Xing. 2017. Uncorrelation and evenness: a new diversity-promoting regularizer. In International Conference on Machine Learning. 3811--3820.Google ScholarGoogle Scholar
  45. Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. 2014. Embedding entities and relations for learning and inference in knowledge bases. arXiv preprint arXiv:1412.6575 (2014).Google ScholarGoogle Scholar
  46. Yang Yu, Yu-Feng Li, and Zhi-Hua Zhou. 2011. Diversity regularized machine. In Twenty-Second International Joint Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  47. Fuzheng Zhang, Nicholas Jing Yuan, Defu Lian, Xing Xie, and Wei-Ying Ma. 2016. Collaborative knowledge base embedding for recommender systems. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 353--362.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Yuyu Zhang, Hanjun Dai, Zornitsa Kozareva, Alexander Smola, and Le Song. 2018. Variational reasoning for question answering with knowledge graph. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A Lightweight Knowledge Graph Embedding Framework for Efficient Inference and Storage

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management
      October 2021
      4966 pages
      ISBN:9781450384469
      DOI:10.1145/3459637

      Copyright © 2021 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 30 October 2021

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader