skip to main content
10.1145/3514221.3526049acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Saga: A Platform for Continuous Construction and Serving of Knowledge at Scale

Published:11 June 2022Publication History

ABSTRACT

We introduce Saga, a next-generation knowledge construction and serving platform for powering knowledge-based applications at industrial scale. Saga follows a hybrid batch-incremental design to continuously integrate billions of facts about real-world entities and construct a central knowledge graph that supports multiple production use cases with diverse requirements around data freshness, accuracy, and availability. In this paper, we discuss the unique challenges associated with knowledge graph construction at industrial scale, and review the main components of Saga and how they address these challenges. Finally, we share lessons-learned from a wide array of production use cases powered by Saga.

References

  1. JSON for linking data. https://json-ld.org.Google ScholarGoogle Scholar
  2. Anish Acharya, Suranjit Adhikari, Sanchit Agarwal, Vincent Auvray, Nehal Belgamwar, Arijit Biswas, Shubhra Chandra, Tagyoung Chung, Maryam Fazel-Zarandi, Raefer Gabriel, Shuyang Gao, Rahul Goel, Dilek Hakkani-Tur, Jan Jezabek, Abhay Jha, Jiun-Yu Kao, Prakash Krishnan, Peter Ku, Anuj Goyal, Chien- Wei Lin, Qing Liu, Arindam Mandal, Angeliki Metallinou, Vishal Naik, Yi Pan, Shachi Paul, Vittorio Perera, Abhishek Sethi, Minmin Shen, Nikko Strom, and Eddie Wang. 2021. Alexa Conversations: An Extensible Data-driven Approach for Building Task-oriented Dialogue Systems. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. Association for Computational Linguistics, Online, 125--132. https://doi.org/10.18653/v1/2021.naacl- demos.15Google ScholarGoogle Scholar
  3. Charu C Aggarwal and Haixun Wang. 2010. Graph data management and mining: A survey of algorithms and applications. In Managing and mining graph data. Springer, 13--68.Google ScholarGoogle Scholar
  4. Mohammad Al Hasan, Vineet Chaoji, Saeed Salem, and Mohammed Zaki. Link prediction using supervised learning.Google ScholarGoogle Scholar
  5. Renzo Angles and Claudio Gutierrez. 2008. Survey of graph database models. ACM Computing Surveys (CSUR) 40, 1 (2008), 1--39.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Michele Banko and Oren Etzioni. 2008. The tradeoffs between open and traditional relation extraction. In Proceedings of ACL-08: HLT. 28--36.Google ScholarGoogle Scholar
  7. Indrajit Bhattacharya and Lise Getoor. 2007. Collective entity resolution in relational data. ACM Transactions on Knowledge Discovery from Data (TKDD) 1, 1 (2007), 5--es.Google ScholarGoogle Scholar
  8. Mikhail Bilenko and Raymond J Mooney. 2003. Adaptive duplicate detection using learnable string similarity measures. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. 39--48.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, New York, NY, USA, 1247--1250. http://portal.acm.org/citation.cfm?id=1376746#Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Antoine Bordes, Nicolas Usunier, Alberto Garcia-Durán, Jason Weston, and Oksana Yakhnenko. 2013. Translating Embeddings for Modeling Multi-Relational Data. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2 (Lake Tahoe, Nevada) (NIPS'13). Curran Associates Inc., Red Hook, NY, USA, 2787â??2795.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Sergey Brin and Lawrence Page. 1998. The Anatomy of a Large-Scale Hyper-textual Web Search Engine. In COMPUTER NETWORKS AND ISDN SYSTEMS. 107--117.Google ScholarGoogle Scholar
  12. Douglas Burdick, Ronald Fagin, Phokion G Kolaitis, Lucian Popa, and Wang-Chiew Tan. 2016. A declarative framework for linking entities. ACM Transactions on Database Systems (TODS) 41, 3 (2016), 1--38.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Hongyun Cai, Vincent W Zheng, and Kevin Chen-Chuan Chang. 2018. A comprehensive survey of graph embedding: Problems, techniques, and applications. IEEE Transactions on Knowledge and Data Engineering 30, 9 (2018), 1616--1637.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Upen S Chakravarthy and Jack Minker. 1986. Multiple Query Processing in Deductive Databases using Query Graphs.. In VLDB, Vol. 86. Citeseer, 384--391.Google ScholarGoogle Scholar
  15. Muhao Chen, Yingtao Tian, Mohan Yang, and Carlo Zaniolo. 2016. Multilingual knowledge graph embeddings for cross-lingual knowledge alignment. arXiv preprint arXiv:1611.03954 (2016).Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Wenhu Chen, Yu Su, Xifeng Yan, and William Yang Wang. 2020. KGPT: Knowledge-Grounded Pre-Training for Data-to-Text Generation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 8635--8648. https://doi.org/10.18653/v1/2020.emnlp-main.697Google ScholarGoogle ScholarCross RefCross Ref
  17. William W Cohen, Pradeep Ravikumar, Stephen E Fienberg, et al. 2003. A Comparison of String Distance Metrics for Name-Matching Tasks.. In IIWeb, Vol. 3. Citeseer, 73--78.Google ScholarGoogle Scholar
  18. Nilesh Dalvi, Ravi Kumar, Bo Pang, Raghu Ramakrishnan, Andrew Tomkins, Philip Bohannon, Sathiya Keerthi, and Srujana Merugu. 2009. A Web of Concepts. In Proceedings of the Twenty-Eighth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (Providence, Rhode Island, USA) (PODS '09). 1â12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Christopher De Sa, Alex Ratner, Christopher Ré, Jaeho Shin, Feiran Wang, Sen Wu, and Ce Zhang. 2016. DeepDive: Declarative Knowledge Base Construction. SIGMOD Rec. 45, 1 (June 2016), 60â67.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. AnHai Doan, Alon Y. Halevy, and Zachary G. Ives. 2012. Principles of Data Integration.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. AnHai Doan, Pradap Konda, Paul Suganthan GC, Yash Govind, Derek Paulsen, Kaushik Chandrasekhar, Philip Martinkus, and Matthew Christie. 2020. Magellan: toward building ecosystems of entity matching solutions. Commun. ACM 63, 8 (2020), 83--91.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. 2014. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 601--610. https://cs.cmu.edu/~nlao/publication/2014.kdd.pdfGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  23. Xin Luna Dong. 2018. Challenges and innovations in building a product knowledge graph. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2869--2869.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Xin Luna Dong, Laure Berti-Equille, and Divesh Srivastava. 2009. Integrating Conflicting Data: The Role of Source Dependence. Proc. VLDB Endow. 2, 1 (Aug. 2009), 550â561. https://doi.org/10.14778/1687627.1687690Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Xin Luna Dong, Evgeniy Gabrilovich, Kevin Murphy, Van Dang, Wilko Horn, Camillo Lugaresi, Shaohua Sun, and Wei Zhang. 2015. Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources. Proc. VLDB Endow. 8, 9 (May 2015), 938â949. https://doi.org/10.14778/2777598.2777603Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Xin Luna Dong and Felix Naumann. 2009. Data fusion: resolving data conflicts for integration. Proceedings of the VLDB Endowment 2, 2 (2009), 1654--1655.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Xin Luna Dong and Divesh Srivastava. 2013. Big data integration. In 2013 IEEE 29th international conference on data engineering (ICDE). IEEE, 1245--1248.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Jennie Duggan, Aaron J. Elmore, Michael Stonebraker, Magda Balazinska, Bill Howe, Jeremy Kepner, Sam Madden, David Maier, Tim Mattson, and Stan Zdonik. 2015. The BigDAWG Polystore System. SIGMOD Rec. 44, 2 (Aug. 2015), 11--16. https://doi.org/10.1145/2814710.2814713Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Ahmed Elmagarmid, Ihab F Ilyas, Mourad Ouzzani, Jorge-Arnulfo Quiané-Ruiz, Nan Tang, and Si Yin. 2014. NADEEF/ER: Generic and interactive entity resolution. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data. 1071--1074.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Oren Etzioni, Michael Cafarella, Doug Downey, Stanley Kok, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates. 2004. Web-Scale Information Extraction in Knowitall: (Preliminary Results). In Proceedings of the 13th International Conference on World Wide Web (New York, NY, USA) (WWW '04). 100â110.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Jing Fan, Adalbert Gerald Soosai Raj, and Jignesh M Patel. 2015. The Case Against Specialized Graph Analytics Engines.. In CIDR.Google ScholarGoogle Scholar
  32. Ethan Fast, Binbin Chen, Julia Mendelsohn, Jonathan Bassen, and Michael S Bernstein. 2018. Iris: A conversational agent for complex tasks. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1--12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Yuqing Gao, Jisheng Liang, Benjamin Han, Mohamed Yakout, and Ahmed Mohamed. 2018. Building a large-scale, accurate and fresh knowledge graph. KDD- 2018, Tutorial 39 (2018), 1939--1374.Google ScholarGoogle Scholar
  34. Yonatan Geifman and Ran El-Yaniv. 2017. Selective Classification for Deep Neural Networks. In Proceedings of the 31st International Conference on Neural In- formation Processing Systems (Long Beach, California, USA) (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 4885â4894.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Lise Getoor and Ashwin Machanavajjhala. 2012. Entity resolution: theory, practice & open challenges. Proceedings of the VLDB Endowment 5, 12 (2012), 2018-- 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Ralph Grishman and Beth Sundheim. 1996. Message Understanding Conference-6: A Brief History. In COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics. https://aclanthology.org/C96--1079Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Anja Gruenheid, Xin Luna Dong, and Divesh Srivastava. 2014. Incremental record linkage. Proceedings of the VLDB Endowment 7, 9 (2014), 697--708.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Alireza Heidari, Joshua McGrath, Ihab F Ilyas, and Theodoros Rekatsinas. 2019. Holodetect: Few-shot learning for error detection. In Proceedings of the 2019 International Conference on Management of Data. 829--846.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Alireza Heidari, George Michalopoulos, Shrinu Kushagra, Ihab F Ilyas, and Theodoros Rekatsinas. 2020. Record fusion: A learning approach. arXiv preprint arXiv:2006.10208 (2020).Google ScholarGoogle Scholar
  40. Ihab F Ilyas and Xu Chu. 2019. Data cleaning. Morgan & Claypool.Google ScholarGoogle Scholar
  41. Alekh Jindal, Praynaa Rawlani, Eugene Wu, Samuel Madden, Amol Deshpande, and Mike Stonebraker. 2014. Vertexica: your relational friend for graph analytics! (2014).Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Rogers Jeffrey Leo John, Navneet Potti, and Jignesh M Patel. 2017. Ava: From Data to Insights Through Conversations.. In CIDR.Google ScholarGoogle Scholar
  43. Yoon Kim, Yacine Jernite, David Sontag, and Alexander M Rush. 2016. Character-aware neural language models. In Thirtieth AAAI conference on artificial intelligence.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Nick Koudas, Sunita Sarawagi, and Divesh Srivastava. 2006. Record linkage: similarity measures and algorithms. In Proceedings of the 2006 ACM SIGMOD international conference on Management of data. 802--803.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360 (2016).Google ScholarGoogle Scholar
  46. Ora Lassila, Ralph R Swick, et al. 1998. Resource description framework (RDF) model and syntax specification. (1998).Google ScholarGoogle Scholar
  47. Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, SÃren Auer, and Christian Bizer. 2015. DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia. 6 (2015).Google ScholarGoogle ScholarCross RefCross Ref
  48. Maurizio Lenzerini. 2002. Data integration: A theoretical perspective. In Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. 233--246.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Adam Lerer, Ledell Wu, Jiajun Shen, Timothée Lacroix, Luca Wehrstedt, Abhijit Bose, and Alexander Peysakhovich. 2019. PyTorch-BigGraph: A Large-scale Graph Embedding System. CoRR abs/1903.12287 (2019). arXiv:1903.12287 http://arxiv.org/abs/1903.12287Google ScholarGoogle Scholar
  50. Yaliang Li, Jing Gao, Chuishi Meng, Qi Li, Lu Su, Bo Zhao, Wei Fan, and Jiawei Han. 2016. A survey on truth discovery. ACM Sigkdd Explorations Newsletter 17, 2 (2016), 1--16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 2015. Learning entity and relation embeddings for knowledge graph completion. In Twenty-ninth AAAI conference on artificial intelligence.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Yankai Lin, Shiqi Shen, Zhiyuan Liu, Huanbo Luan, and Maosong Sun. 2016. Neural relation extraction with selective attention over instances. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2124--2133.Google ScholarGoogle ScholarCross RefCross Ref
  53. Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. 1003--1011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. T. Mitchell, W. Cohen, E. Hruscha, P. Talukdar, J. Betteridge, A. Carlson, B. Dalvi, M. Gardner, B. Kisiel, J. Krishnamurthy, N. Lao, K. Mazaitis, T. Mohammad, N. Nakashole, E. Platanios, A. Ritter, M. Samadi, B. Settles, R. Wang, D. Wijaya, A. Gupta, X. Chen, A. Saparov, M. Greaves, and J. Welling. 2015. Never-Ending Learning. In AAAI. http://www.cs.cmu.edu/~wcohen/pubs.html : Never-Ending Learning in AAAI-2015.Google ScholarGoogle Scholar
  55. Makoto Miwa and Mohit Bansal. 2016. End-to-end relation extraction using lstms on sequences and tree structures. arXiv preprint arXiv:1601.00770 (2016).Google ScholarGoogle Scholar
  56. Jason Mohoney, Roger Waleffe, Henry Xu, Theodoros Rekatsinas, and Shivaram Venkataraman. 2021. Marius: Learning Massive Graph Embeddings on a Single Machine. In 15th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2021, July 14--16, 2021, Angela Demke Brown and Jay R. Lorch (Eds.). USENIX Association, 533--549.Google ScholarGoogle Scholar
  57. Sidharth Mudgal, Han Li, Theodoros Rekatsinas, AnHai Doan, Youngchoon Park, Ganesh Krishnan, Rohit Deep, Esteban Arcaute, and Vijay Raghavendra. 2018. Deep learning for entity matching: A design space exploration. In Proceedings of the 2018 International Conference on Management of Data. 19--34.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Isaiah Onando Mulang', Kuldeep Singh, Chaitali Prabhu, Abhishek Nadgeri, Johannes Hoffart, and Jens Lehmann. 2020. Evaluating the Impact of Knowledge Graph Context on Entity Disambiguation Models. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management (Virtual Event, Ireland) (CIKM '20). Association for Computing Machinery, New York, NY, USA, 2157â2160. https://doi.org/10.1145/3340531.3412159Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. David Nadeau and Satoshi Sekine. 2007. A survey of named entity recognition and classification. Lingvisticae Investigationes 30, 1 (2007), 3--26.Google ScholarGoogle ScholarCross RefCross Ref
  60. Dat Ba Nguyen, Johannes Hoffart, Martin Theobald, and Gerhard Weikum. 2014. Aida-light: High-throughput named-entity disambiguation. In LDOW.Google ScholarGoogle Scholar
  61. Natasha Noy, Yuqing Gao, Anshu Jain, Anant Narayanan, Alan Patterson, and Jamie Taylor. 2019. Industry-Scale Knowledge Graphs: Lessons and Challenges. Commun. ACM 62, 8 (July 2019), 36â43. https://doi.org/10.1145/3331166Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Laurel Orr, Megan Leszczynski, Simran Arora, Sen Wu, Neel Guha, Xiao Ling, and Christopher Re. 2021. Bootleg: Chasing the tail with self-supervised named entity disambiguation. CIDR (2021).Google ScholarGoogle Scholar
  63. Xinghao Pan, Dimitris Papailiopoulos, Samet Oymak, Benjamin Recht, Kannan Ramchandran, and Michael I. Jordan. 2015. Parallel Correlation Clustering on Big Graphs. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1 (Montreal, Canada) (NIPS'15). MIT Press, Cambridge, MA, USA, 82â90.Google ScholarGoogle Scholar
  64. George Papadakis, Dimitrios Skoutas, Emmanouil Thanos, and Themis Palpanas. 2020. Blocking and filtering techniques for entity resolution: A survey. ACM Computing Surveys (CSUR) 53, 2 (2020), 1--42.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Vibhor Rastogi, Nilesh Dalvi, and Minos Garofalakis. 2011. Large-Scale Collective Entity Matching. Proc. VLDB Endow. 4, 4 (Jan. 2011), 208â218. https://doi.org/10.14778/1938545.1938546Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Theodoros Rekatsinas, Xu Chu, Ihab F. Ilyas, and Christopher Ré. 2017. Holo- Clean: Holistic Data Repairs with Probabilistic Inference. Proc. VLDB Endow. 10, 11 (Aug. 2017), 1190â1201. https://doi.org/10.14778/3137628.3137631Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Theodoros Rekatsinas, Manas Joglekar, Hector Garcia-Molina, Aditya Parameswaran, and Christopher Ré. 2017. SLiMFast: Guaranteed Results for Data Fusion and Source Reliability. In Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD '17). Association for Computing Machinery, New York, NY, USA, 1399â1414. https://doi.org/10.1145/3035918.3035951Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Alieh Saeedi, Eric Peukert, and Erhard Rahm. 2017. Comparative evaluation of distributed clustering schemes for multi-source entity resolution. In European Conference on Advances in Databases and Information Systems. Springer, 278--293.Google ScholarGoogle ScholarCross RefCross Ref
  69. Masoud Salehpour and Joseph G. Davis. 2020. The Effects of Different JSON Representations on Querying Knowledge Graphs. CoRR abs/2004.04286 (2020). arXiv:2004.04286 https://arxiv.org/abs/2004.04286Google ScholarGoogle Scholar
  70. Timos K. Sellis. 1986. Global Query Optimization. In Proceedings of the 1986 ACM SIGMOD International Conference on Management of Data (Washington, D.C., USA) (SIGMOD '86). Association for Computing Machinery, New York, NY, USA, 191â205. https://doi.org/10.1145/16894.16874Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Rohit Singh, Vamsi Meduri, Ahmed Elmagarmid, Samuel Madden, Paolo Papotti, Jorge-Arnulfo Quiané-Ruiz, Armando Solar-Lezama, and Nan Tang. 2017. Generating concise entity matching rules. In Proceedings of the 2017 ACM International Conference on Management of Data. 1635--1638.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Rebecca C Steorts, Samuel L Ventura, Mauricio Sadinle, and Stephen E Fienberg. 2014. A comparison of blocking methods for record linkage. In International conference on privacy in statistical databases. Springer, 253--268.Google ScholarGoogle ScholarCross RefCross Ref
  73. Michael Stonebraker, Daniel Bruckner, Ihab F Ilyas, George Beskales, Mitch Cherniack, Stanley B Zdonik, Alexander Pagan, and Shan Xu. 2013. Data Curation at Scale: The Data Tamer System.. In Cidr, Vol. 2013.Google ScholarGoogle Scholar
  74. Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago: A Core of Semantic Knowledge. In Proceedings of the 16th International Conference on World Wide Web (Banff, Alberta, Canada) (WWW '07). 697â706.Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 6000â6010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Denny Vrandei and Markus Krötzsch. 2014. Wikidata: A Free Collaborative Knowledgebase. Commun. ACM 57, 10 (Sept. 2014), 78â85. https://doi.org/10.1145/2629489Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Jiannan Wang, Guoliang Li, Jeffrey Xu Yu, and Jianhua Feng. 2011. Entity matching: How similar is similar. Proceedings of the VLDB Endowment 4, 10 (2011), 622--633.Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Luyu Wang, Yujia Li, Özlem Aslan, and Oriol Vinyals. 2021. WikiGraphs: A Wikipedia Text - Knowledge Graph Paired Dataset. CoRR abs/2107.09556 (2021). arXiv:2107.09556 https://arxiv.org/abs/2107.09556Google ScholarGoogle Scholar
  79. Jason Wei and Kai Zou. 2019. Eda: Easy data augmentation techniques for boosting performance on text classification tasks. arXiv preprint arXiv:1901.11196 (2019).Google ScholarGoogle Scholar
  80. Gerhard Weikum, Xin Luna Dong, Simon Razniewski, and Fabian M. Suchanek. 2021. Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases. Found. Trends Databases 10, 2--4 (2021), 108--490. https://doi.org/10.1561/1900000064Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Steven Euijong Whang, David Menestrina, Georgia Koutrika, Martin Theobald, and Hector Garcia-Molina. 2009. Entity resolution with iterative blocking. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data. 219--232.Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Richard Wu, Aoqian Zhang, Ihab Ilyas, and Theodoros Rekatsinas. 2020. Attention-based Learning for Missing Data Imputation in HoloClean. In Proceedings of Machine Learning and Systems, Vol. 2. 307--325.Google ScholarGoogle Scholar
  83. Ikuya Yamada, Koki Washio, Hiroyuki Shindo, and Yuji Matsumoto. 2019. Global Entity Disambiguation with Pretrained Contextualized Embeddings of Words and Entities. arXiv: Computation and Language (2019).Google ScholarGoogle Scholar
  84. Yan Yan, Stephen Meyles, Aria Haghighi, and Dan Suciu. 2020. Entity Matching in the Wild: A Consistent and Versatile Framework to Unify Data in Industrial Applications. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 2287--2301.Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. 2015. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7--9, 2015, Conference Track Proceedings.Google ScholarGoogle Scholar
  86. Hongxia Yang. 2019. Aligraph: A comprehensive graph neural network platform. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 3165--3166.Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Da Zheng, Xiang Song, Chao Ma, Zeyuan Tan, Zihao Ye, Jin Dong, Hao Xiong, Zheng Zhang, and George Karypis. 2020. DGL-KE: Training Knowledge Graph Embeddings at Scale. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '20). Association for Computing Machinery, New York, NY, USA, 739â748. https://doi.org/10.1145/3397271.3401172Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Yue Zhuge, Hector Garcia-Molina, Joachim Hammer, and Jennifer Widom. 1995. View maintenance in a warehousing environment. In Proceedings of the 1995 ACM SIGMOD international conference on Management of data. 316--327.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Saga: A Platform for Continuous Construction and Serving of Knowledge at Scale

                  Recommendations

                  Comments

                  Login options

                  Check if you have access through your login credentials or your institution to get full access on this article.

                  Sign in
                  • Published in

                    cover image ACM Conferences
                    SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data
                    June 2022
                    2597 pages
                    ISBN:9781450392495
                    DOI:10.1145/3514221

                    Copyright © 2022 ACM

                    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

                    Publisher

                    Association for Computing Machinery

                    New York, NY, United States

                    Publication History

                    • Published: 11 June 2022

                    Permissions

                    Request permissions about this article.

                    Request Permissions

                    Check for updates

                    Qualifiers

                    • research-article

                    Acceptance Rates

                    Overall Acceptance Rate785of4,003submissions,20%

                  PDF Format

                  View or Download as a PDF file.

                  PDF

                  eReader

                  View online with eReader.

                  eReader