skip to main content
10.1145/3292500.3330785acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

OAG: Toward Linking Large-scale Heterogeneous Entity Graphs

Published:25 July 2019Publication History

ABSTRACT

Linking entities from different sources is a fundamental task in building open knowledge graphs. Despite much research conducted in related fields, the challenges of linkinglarge-scale heterogeneous entity graphs are far from resolved. Employing two billion-scale academic entity graphs (Microsoft Academic Graph and AMiner) as sources for our study, we propose a unified framework --- LinKG --- to address the problem of building a large-scale linked entity graph. LinKG is coupled with three linking modules, each of which addresses one category of entities. To link word-sequence-based entities (e.g., venues), we present a long short-term memory network-based method for capturing the dependencies. To link large-scale entities (e.g., papers), we leverage locality-sensitive hashing and convolutional neural networks for scalable and precise linking. To link entities with ambiguity (e.g., authors), we propose heterogeneous graph attention networks to model different types of entities. Our extensive experiments and systematical analysis demonstrate that LinKG can achieve linking accuracy with an F1-score of 0.9510, significantly outperforming the state-of-the-art. LinKG has been deployed to Microsoft Academic Search and AMiner to integrate the two large graphs. We have published the linked results---the Open Academic Graph (OAG)\footnote\urlhttps://www.openacademic.ai/oag/ , making it the largest publicly available heterogeneous academic graph to date.

Skip Supplemental Material Section

Supplemental Material

p2585-zhang.mp4

mp4

852.1 MB

References

  1. Ron Bekkerman and Andrew McCallum. 2005. Disambiguating Web Appearances of People in a Social Network. In WWW'05 . 463--470. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Omar Benjelloun, Hector Garcia-Molina, David Menestrina, Qi Su, Steven Euijong Whang, and Jennifer Widom. 2009. Swoosh: a generic approach to entity resolution. The VLDB Journal , Vol. 18, 1 (2009), 255--276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Indrajit Bhattacharya and Lise Getoor. 2007. Collective entity resolution in relational data. TKDE , Vol. 1, 1 (2007), 5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Mikhail Bilenko. 2004. Learnable Similarity Functions and their Applications to Clustering and Record Linkage. In AAAI'04. 981--982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. George E Dahl, Tara N Sainath, and Geoffrey E Hinton. 2013. Improving deep neural networks for LVCSR using rectified linear units and dropout. In ICASSP'13. 8609--8613.Google ScholarGoogle ScholarCross RefCross Ref
  6. Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S Mirrokni. 2004. Locality-sensitive hashing scheme based on p-stable distributions. In SCG'04 . 253--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ahmed K Elmagarmid, Panagiotis G Ipeirotis, and Vassilios S Verykios. 2007. Duplicate record detection: A survey. TKDE , Vol. 19, 1 (2007), 1--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Hui Han, Lee Giles, Hongyuan Zha, Cheng Li, and Kostas Tsioutsiouliklis. 2004. Two Supervised Learning Approaches for Name Disambiguation in Author Citations. In JCDL'04 . 296--305. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Mark Heimann, Haoming Shen, Tara Safavi, and Danai Koutra. 2018. Regal: Representation learning-based graph alignment. In CIKM'18 . 117--126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation , Vol. 9, 8 (1997), 1735--1780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Lili Jiang, Jianyong Wang, Ning An, Shengyuan Wang, Jian Zhan, and Lian Li. 2009. Grape: A graph-based framework for disambiguating people appearances in web search. In ICDM'09 . 199--208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Mahboob Alam Khalid, Valentin Jijkoun, and Maarten De Rijke. 2008. The impact of named entity normalization on information retrieval for question answering. In ECIR'08. 705--710. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Quoc V Le and Tomas Mikolov. 2014. Distributed Representations of Sentences and Documents.. In ICML, Vol. 14. 1188--1196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE , Vol. 86, 11 (1998), 2278--2324.Google ScholarGoogle ScholarCross RefCross Ref
  15. Lingli Li, Jianzhong Li, and Hong Gao. 2015. Rule-Based method for entity resolution. TKDE , Vol. 27, 1 (2015), 250--263.Google ScholarGoogle ScholarCross RefCross Ref
  16. Xin Li, Paul Morie, and Dan Roth. 2004. Identification and tracing of ambiguous names: Discriminative and generative approaches. In AAAI'04. 419--424. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Li Liu, William K Cheung, Xin Li, and Lejian Liao. 2016. Aligning Users across Social Networks Using Network Embedding. In IJCAI . 1774--1780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Tong Man, Huawei Shen, Shenghua Liu, Xiaolong Jin, and Xueqi Cheng. 2016. Predict Anchor Links across Social Networks via an Embedding Approach.. In IJCAI'16 . 1823--1829. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In NIPS'13 . 3111--3119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Andrea Moro, Alessandro Raganato, and Roberto Navigli. 2014. Entity linking meets word sense disambiguation: a unified approach. Transactions of the Association for Computational Linguistics , Vol. 2 (2014), 231--244.Google ScholarGoogle ScholarCross RefCross Ref
  21. Jiezhong Qiu, Jian Tang, Hao Ma, Yuxiao Dong, Kuansan Wang, and Jie Tang. 2018. Deepinf: Social influence prediction with deep learning. In KDD'18. ACM, 2110--2119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Shu Rong, Xing Niu, Evan Xiang, Haofen Wang, Qiang Yang, and Yong Yu. 2012. A machine learning approach for instance matching based on similarity metrics. In ISWC'12 . 460--475. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Sunita Sarawagi and Anuradha Bhamidipaty. 2002. Interactive deduplication using active learning. In KDD'02. 269--278. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Wei Shen, Jiawei Han, and Jianyong Wang. 2014. A probabilistic model for linking named entities in web text with heterogeneous information networks. In SIGMOD'14. ACM, 1199--1210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Wei Shen, Jianyong Wang, and Jiawei Han. 2015. Entity linking with a knowledge base: Issues, techniques, and solutions. TKDE , Vol. 27, 2 (2015), 443--460.Google ScholarGoogle ScholarCross RefCross Ref
  26. Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-june Paul Hsu, and Kuansan Wang. 2015. An overview of microsoft academic service (mas) and applications. In WWW'15 . 243--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Rebecca C. Steorts, Samuel L. Ventura, Mauricio Sadinle, and Stephen E. Fienberg. 2014. A Comparison of Blocking Methods for Record Linkage. In PSD'14. 253--268.Google ScholarGoogle Scholar
  28. Yizhou Sun and Jiawei Han. 2013. Mining heterogeneous information networks: a structural analysis approach. Acm Sigkdd Explorations Newsletter , Vol. 14, 2 (2013), 20--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Shulong Tan, Ziyu Guan, Deng Cai, Xuzhen Qin, Jiajun Bu, and Chun Chen. 2014. Mapping Users across Networks by Manifold Alignment on Hypergraph. In AAAI'14 . 159--165. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Jie Tang, A.C.M. Fong, Bo Wang, and Jing Zhang. 2012. A Unified Probabilistic Framework for Name Disambiguation in Digital Library. IEEE TKDE , Vol. 24, 6 (2012), 975--987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Jie Tang, Juanzi Li, Bangyong Liang, Xiaotong Huang, Yi Li, and Kehong Wang. 2006. Using Bayesian decision for ontology mapping. Journal of Web Semantics , Vol. 4, 4 (2006), 243--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In WWW'15. 1067--1077. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. ArnetMiner: Extraction and Mining of Academic Social Networks. In KDD'08 . 990--998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Petar Velivc ković , Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò , and Yoshua Bengio. 2018. Graph Attention Networks . ICLR (2018).Google ScholarGoogle Scholar
  35. Zhichun Wang, Juanzi Li, Zhigang Wang, and Jie Tang. 2012. Cross-lingual knowledge linking across wiki knowledge bases. In WWW'12 . 459--468. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Yang Yang, Yizhou Sun, Jie Tang, Bo Ma, and Juanzi Li. 2015. Entity Matching Across Heterogeneous Sources. In KDD'15. 1395--1404. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Xiaoxin Yin, Jiawei Han, and Philip S Yu. 2007. Object distinction: Distinguishing objects with identical names. In ICDE'07 . 1242--1246.Google ScholarGoogle ScholarCross RefCross Ref
  38. Jing Zhang, Bo Chen, Xianming Wang, Hong Chen, Cuiping Li, Fengmei Jin, Guojie Song, and Yutao Zhang. 2018a. MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks. In CIKM'18 . 327--336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Yutao Zhang, Jie Tang, Zhilin Yang, Jian Pei, and Philip Yu. 2015. COSNET: Connecting Heterogeneous Social Networks with Local and Global Consistency. In KDD'15 . 1485--1494. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Yutao Zhang, Fanjin Zhang, Peiran Yao, and Jie Tang. 2018b. Name Disambiguation in AMiner: Clustering, Maintenance, and Human in the Loop.. In KDD'18 . 1002--1011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Yan Zhuang, Guoliang Li, Zhuojian Zhong, and Jianhua Feng. 2017. Hike: A hybrid human-machine method for entity alignment in large-scale knowledge bases. In CIKM'17. 1917--1926. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. OAG: Toward Linking Large-scale Heterogeneous Entity Graphs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
      July 2019
      3305 pages
      ISBN:9781450362016
      DOI:10.1145/3292500

      Copyright © 2019 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 July 2019

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      KDD '19 Paper Acceptance Rate110of1,200submissions,9%Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader