research-article

Saga: A Platform for Continuous Construction and Serving of Knowledge at Scale

Authors:
Ihab F. Ilyas

Apple, Seattle, WA, USA

Apple, Seattle, WA, USA
View Profile

,
Theodoros Rekatsinas

Apple, Seattle, WA, USA

Apple, Seattle, WA, USA
View Profile

,
Vishnu Konda

Apple, Cupertino, CA, USA

Apple, Cupertino, CA, USA
View Profile

,
Jeffrey Pound

Apple, Waterloo, ON, Canada

Apple, Waterloo, ON, Canada
View Profile

,
Xiaoguang Qi

Apple, Seattle, WA, USA

Apple, Seattle, WA, USA
View Profile

,
Mohamed Soliman

Apple, Cupertino, CA, USA

Apple, Cupertino, CA, USA
View Profile

SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataJune 2022Pages 2259–2272https://doi.org/10.1145/3514221.3526049

Published:11 June 2022Publication History

SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data

Pages 2259–2272

ABSTRACT

We introduce Saga, a next-generation knowledge construction and serving platform for powering knowledge-based applications at industrial scale. Saga follows a hybrid batch-incremental design to continuously integrate billions of facts about real-world entities and construct a central knowledge graph that supports multiple production use cases with diverse requirements around data freshness, accuracy, and availability. In this paper, we discuss the unique challenges associated with knowledge graph construction at industrial scale, and review the main components of Saga and how they address these challenges. Finally, we share lessons-learned from a wide array of production use cases powered by Saga.

References

JSON for linking data. https://json-ld.org.Google Scholar
Anish Acharya, Suranjit Adhikari, Sanchit Agarwal, Vincent Auvray, Nehal Belgamwar, Arijit Biswas, Shubhra Chandra, Tagyoung Chung, Maryam Fazel-Zarandi, Raefer Gabriel, Shuyang Gao, Rahul Goel, Dilek Hakkani-Tur, Jan Jezabek, Abhay Jha, Jiun-Yu Kao, Prakash Krishnan, Peter Ku, Anuj Goyal, Chien- Wei Lin, Qing Liu, Arindam Mandal, Angeliki Metallinou, Vishal Naik, Yi Pan, Shachi Paul, Vittorio Perera, Abhishek Sethi, Minmin Shen, Nikko Strom, and Eddie Wang. 2021. Alexa Conversations: An Extensible Data-driven Approach for Building Task-oriented Dialogue Systems. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. Association for Computational Linguistics, Online, 125--132. https://doi.org/10.18653/v1/2021.naacl- demos.15Google Scholar
Charu C Aggarwal and Haixun Wang. 2010. Graph data management and mining: A survey of algorithms and applications. In Managing and mining graph data. Springer, 13--68.Google Scholar
Mohammad Al Hasan, Vineet Chaoji, Saeed Salem, and Mohammed Zaki. Link prediction using supervised learning.Google Scholar
Renzo Angles and Claudio Gutierrez. 2008. Survey of graph database models. ACM Computing Surveys (CSUR) 40, 1 (2008), 1--39.Google ScholarDigital Library
Michele Banko and Oren Etzioni. 2008. The tradeoffs between open and traditional relation extraction. In Proceedings of ACL-08: HLT. 28--36.Google Scholar
Indrajit Bhattacharya and Lise Getoor. 2007. Collective entity resolution in relational data. ACM Transactions on Knowledge Discovery from Data (TKDD) 1, 1 (2007), 5--es.Google Scholar
Mikhail Bilenko and Raymond J Mooney. 2003. Adaptive duplicate detection using learnable string similarity measures. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. 39--48.Google ScholarDigital Library
Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, New York, NY, USA, 1247--1250. http://portal.acm.org/citation.cfm?id=1376746#Google ScholarDigital Library
Antoine Bordes, Nicolas Usunier, Alberto Garcia-Durán, Jason Weston, and Oksana Yakhnenko. 2013. Translating Embeddings for Modeling Multi-Relational Data. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2 (Lake Tahoe, Nevada) (NIPS'13). Curran Associates Inc., Red Hook, NY, USA, 2787â??2795.Google ScholarDigital Library
Sergey Brin and Lawrence Page. 1998. The Anatomy of a Large-Scale Hyper-textual Web Search Engine. In COMPUTER NETWORKS AND ISDN SYSTEMS. 107--117.Google Scholar
Douglas Burdick, Ronald Fagin, Phokion G Kolaitis, Lucian Popa, and Wang-Chiew Tan. 2016. A declarative framework for linking entities. ACM Transactions on Database Systems (TODS) 41, 3 (2016), 1--38.Google ScholarDigital Library
Hongyun Cai, Vincent W Zheng, and Kevin Chen-Chuan Chang. 2018. A comprehensive survey of graph embedding: Problems, techniques, and applications. IEEE Transactions on Knowledge and Data Engineering 30, 9 (2018), 1616--1637.Google ScholarDigital Library
Upen S Chakravarthy and Jack Minker. 1986. Multiple Query Processing in Deductive Databases using Query Graphs.. In VLDB, Vol. 86. Citeseer, 384--391.Google Scholar
Muhao Chen, Yingtao Tian, Mohan Yang, and Carlo Zaniolo. 2016. Multilingual knowledge graph embeddings for cross-lingual knowledge alignment. arXiv preprint arXiv:1611.03954 (2016).Google ScholarDigital Library
Wenhu Chen, Yu Su, Xifeng Yan, and William Yang Wang. 2020. KGPT: Knowledge-Grounded Pre-Training for Data-to-Text Generation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 8635--8648. https://doi.org/10.18653/v1/2020.emnlp-main.697Google ScholarCross Ref
William W Cohen, Pradeep Ravikumar, Stephen E Fienberg, et al. 2003. A Comparison of String Distance Metrics for Name-Matching Tasks.. In IIWeb, Vol. 3. Citeseer, 73--78.Google Scholar
Nilesh Dalvi, Ravi Kumar, Bo Pang, Raghu Ramakrishnan, Andrew Tomkins, Philip Bohannon, Sathiya Keerthi, and Srujana Merugu. 2009. A Web of Concepts. In Proceedings of the Twenty-Eighth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (Providence, Rhode Island, USA) (PODS '09). 1â12.Google ScholarDigital Library
Christopher De Sa, Alex Ratner, Christopher Ré, Jaeho Shin, Feiran Wang, Sen Wu, and Ce Zhang. 2016. DeepDive: Declarative Knowledge Base Construction. SIGMOD Rec. 45, 1 (June 2016), 60â67.Google ScholarDigital Library
AnHai Doan, Alon Y. Halevy, and Zachary G. Ives. 2012. Principles of Data Integration.Google ScholarDigital Library
AnHai Doan, Pradap Konda, Paul Suganthan GC, Yash Govind, Derek Paulsen, Kaushik Chandrasekhar, Philip Martinkus, and Matthew Christie. 2020. Magellan: toward building ecosystems of entity matching solutions. Commun. ACM 63, 8 (2020), 83--91.Google ScholarDigital Library
Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. 2014. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 601--610. https://cs.cmu.edu/~nlao/publication/2014.kdd.pdfGoogle ScholarDigital Library
Xin Luna Dong. 2018. Challenges and innovations in building a product knowledge graph. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2869--2869.Google ScholarDigital Library
Xin Luna Dong, Laure Berti-Equille, and Divesh Srivastava. 2009. Integrating Conflicting Data: The Role of Source Dependence. Proc. VLDB Endow. 2, 1 (Aug. 2009), 550â561. https://doi.org/10.14778/1687627.1687690Google ScholarDigital Library
Xin Luna Dong, Evgeniy Gabrilovich, Kevin Murphy, Van Dang, Wilko Horn, Camillo Lugaresi, Shaohua Sun, and Wei Zhang. 2015. Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources. Proc. VLDB Endow. 8, 9 (May 2015), 938â949. https://doi.org/10.14778/2777598.2777603Google ScholarDigital Library
Xin Luna Dong and Felix Naumann. 2009. Data fusion: resolving data conflicts for integration. Proceedings of the VLDB Endowment 2, 2 (2009), 1654--1655.Google ScholarDigital Library
Xin Luna Dong and Divesh Srivastava. 2013. Big data integration. In 2013 IEEE 29th international conference on data engineering (ICDE). IEEE, 1245--1248.Google ScholarDigital Library
Jennie Duggan, Aaron J. Elmore, Michael Stonebraker, Magda Balazinska, Bill Howe, Jeremy Kepner, Sam Madden, David Maier, Tim Mattson, and Stan Zdonik. 2015. The BigDAWG Polystore System. SIGMOD Rec. 44, 2 (Aug. 2015), 11--16. https://doi.org/10.1145/2814710.2814713Google ScholarDigital Library
Ahmed Elmagarmid, Ihab F Ilyas, Mourad Ouzzani, Jorge-Arnulfo Quiané-Ruiz, Nan Tang, and Si Yin. 2014. NADEEF/ER: Generic and interactive entity resolution. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data. 1071--1074.Google ScholarDigital Library
Oren Etzioni, Michael Cafarella, Doug Downey, Stanley Kok, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates. 2004. Web-Scale Information Extraction in Knowitall: (Preliminary Results). In Proceedings of the 13th International Conference on World Wide Web (New York, NY, USA) (WWW '04). 100â110.Google ScholarDigital Library
Jing Fan, Adalbert Gerald Soosai Raj, and Jignesh M Patel. 2015. The Case Against Specialized Graph Analytics Engines.. In CIDR.Google Scholar
Ethan Fast, Binbin Chen, Julia Mendelsohn, Jonathan Bassen, and Michael S Bernstein. 2018. Iris: A conversational agent for complex tasks. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1--12.Google ScholarDigital Library
Yuqing Gao, Jisheng Liang, Benjamin Han, Mohamed Yakout, and Ahmed Mohamed. 2018. Building a large-scale, accurate and fresh knowledge graph. KDD- 2018, Tutorial 39 (2018), 1939--1374.Google Scholar
Yonatan Geifman and Ran El-Yaniv. 2017. Selective Classification for Deep Neural Networks. In Proceedings of the 31st International Conference on Neural In- formation Processing Systems (Long Beach, California, USA) (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 4885â4894.Google ScholarDigital Library
Lise Getoor and Ashwin Machanavajjhala. 2012. Entity resolution: theory, practice & open challenges. Proceedings of the VLDB Endowment 5, 12 (2012), 2018-- 2019.Google ScholarDigital Library
Ralph Grishman and Beth Sundheim. 1996. Message Understanding Conference-6: A Brief History. In COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics. https://aclanthology.org/C96--1079Google ScholarDigital Library
Anja Gruenheid, Xin Luna Dong, and Divesh Srivastava. 2014. Incremental record linkage. Proceedings of the VLDB Endowment 7, 9 (2014), 697--708.Google ScholarDigital Library
Alireza Heidari, Joshua McGrath, Ihab F Ilyas, and Theodoros Rekatsinas. 2019. Holodetect: Few-shot learning for error detection. In Proceedings of the 2019 International Conference on Management of Data. 829--846.Google ScholarDigital Library
Alireza Heidari, George Michalopoulos, Shrinu Kushagra, Ihab F Ilyas, and Theodoros Rekatsinas. 2020. Record fusion: A learning approach. arXiv preprint arXiv:2006.10208 (2020).Google Scholar
Ihab F Ilyas and Xu Chu. 2019. Data cleaning. Morgan & Claypool.Google Scholar
Alekh Jindal, Praynaa Rawlani, Eugene Wu, Samuel Madden, Amol Deshpande, and Mike Stonebraker. 2014. Vertexica: your relational friend for graph analytics! (2014).Google ScholarDigital Library
Rogers Jeffrey Leo John, Navneet Potti, and Jignesh M Patel. 2017. Ava: From Data to Insights Through Conversations.. In CIDR.Google Scholar
Yoon Kim, Yacine Jernite, David Sontag, and Alexander M Rush. 2016. Character-aware neural language models. In Thirtieth AAAI conference on artificial intelligence.Google ScholarDigital Library
Nick Koudas, Sunita Sarawagi, and Divesh Srivastava. 2006. Record linkage: similarity measures and algorithms. In Proceedings of the 2006 ACM SIGMOD international conference on Management of data. 802--803.Google ScholarDigital Library
Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360 (2016).Google Scholar
Ora Lassila, Ralph R Swick, et al. 1998. Resource description framework (RDF) model and syntax specification. (1998).Google Scholar
Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, SÃren Auer, and Christian Bizer. 2015. DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia. 6 (2015).Google ScholarCross Ref
Maurizio Lenzerini. 2002. Data integration: A theoretical perspective. In Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. 233--246.Google ScholarDigital Library
Adam Lerer, Ledell Wu, Jiajun Shen, Timothée Lacroix, Luca Wehrstedt, Abhijit Bose, and Alexander Peysakhovich. 2019. PyTorch-BigGraph: A Large-scale Graph Embedding System. CoRR abs/1903.12287 (2019). arXiv:1903.12287 http://arxiv.org/abs/1903.12287Google Scholar
Yaliang Li, Jing Gao, Chuishi Meng, Qi Li, Lu Su, Bo Zhao, Wei Fan, and Jiawei Han. 2016. A survey on truth discovery. ACM Sigkdd Explorations Newsletter 17, 2 (2016), 1--16.Google ScholarDigital Library
Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 2015. Learning entity and relation embeddings for knowledge graph completion. In Twenty-ninth AAAI conference on artificial intelligence.Google ScholarDigital Library
Yankai Lin, Shiqi Shen, Zhiyuan Liu, Huanbo Luan, and Maosong Sun. 2016. Neural relation extraction with selective attention over instances. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2124--2133.Google ScholarCross Ref
Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. 1003--1011.Google ScholarDigital Library
T. Mitchell, W. Cohen, E. Hruscha, P. Talukdar, J. Betteridge, A. Carlson, B. Dalvi, M. Gardner, B. Kisiel, J. Krishnamurthy, N. Lao, K. Mazaitis, T. Mohammad, N. Nakashole, E. Platanios, A. Ritter, M. Samadi, B. Settles, R. Wang, D. Wijaya, A. Gupta, X. Chen, A. Saparov, M. Greaves, and J. Welling. 2015. Never-Ending Learning. In AAAI. http://www.cs.cmu.edu/~wcohen/pubs.html : Never-Ending Learning in AAAI-2015.Google Scholar
Makoto Miwa and Mohit Bansal. 2016. End-to-end relation extraction using lstms on sequences and tree structures. arXiv preprint arXiv:1601.00770 (2016).Google Scholar
Jason Mohoney, Roger Waleffe, Henry Xu, Theodoros Rekatsinas, and Shivaram Venkataraman. 2021. Marius: Learning Massive Graph Embeddings on a Single Machine. In 15th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2021, July 14--16, 2021, Angela Demke Brown and Jay R. Lorch (Eds.). USENIX Association, 533--549.Google Scholar
Sidharth Mudgal, Han Li, Theodoros Rekatsinas, AnHai Doan, Youngchoon Park, Ganesh Krishnan, Rohit Deep, Esteban Arcaute, and Vijay Raghavendra. 2018. Deep learning for entity matching: A design space exploration. In Proceedings of the 2018 International Conference on Management of Data. 19--34.Google ScholarDigital Library
Isaiah Onando Mulang', Kuldeep Singh, Chaitali Prabhu, Abhishek Nadgeri, Johannes Hoffart, and Jens Lehmann. 2020. Evaluating the Impact of Knowledge Graph Context on Entity Disambiguation Models. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management (Virtual Event, Ireland) (CIKM '20). Association for Computing Machinery, New York, NY, USA, 2157â2160. https://doi.org/10.1145/3340531.3412159Google ScholarDigital Library
David Nadeau and Satoshi Sekine. 2007. A survey of named entity recognition and classification. Lingvisticae Investigationes 30, 1 (2007), 3--26.Google ScholarCross Ref
Dat Ba Nguyen, Johannes Hoffart, Martin Theobald, and Gerhard Weikum. 2014. Aida-light: High-throughput named-entity disambiguation. In LDOW.Google Scholar
Natasha Noy, Yuqing Gao, Anshu Jain, Anant Narayanan, Alan Patterson, and Jamie Taylor. 2019. Industry-Scale Knowledge Graphs: Lessons and Challenges. Commun. ACM 62, 8 (July 2019), 36â43. https://doi.org/10.1145/3331166Google ScholarDigital Library
Laurel Orr, Megan Leszczynski, Simran Arora, Sen Wu, Neel Guha, Xiao Ling, and Christopher Re. 2021. Bootleg: Chasing the tail with self-supervised named entity disambiguation. CIDR (2021).Google Scholar
Xinghao Pan, Dimitris Papailiopoulos, Samet Oymak, Benjamin Recht, Kannan Ramchandran, and Michael I. Jordan. 2015. Parallel Correlation Clustering on Big Graphs. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1 (Montreal, Canada) (NIPS'15). MIT Press, Cambridge, MA, USA, 82â90.Google Scholar
George Papadakis, Dimitrios Skoutas, Emmanouil Thanos, and Themis Palpanas. 2020. Blocking and filtering techniques for entity resolution: A survey. ACM Computing Surveys (CSUR) 53, 2 (2020), 1--42.Google ScholarDigital Library
Vibhor Rastogi, Nilesh Dalvi, and Minos Garofalakis. 2011. Large-Scale Collective Entity Matching. Proc. VLDB Endow. 4, 4 (Jan. 2011), 208â218. https://doi.org/10.14778/1938545.1938546Google ScholarDigital Library
Theodoros Rekatsinas, Xu Chu, Ihab F. Ilyas, and Christopher Ré. 2017. Holo- Clean: Holistic Data Repairs with Probabilistic Inference. Proc. VLDB Endow. 10, 11 (Aug. 2017), 1190â1201. https://doi.org/10.14778/3137628.3137631Google ScholarDigital Library
Theodoros Rekatsinas, Manas Joglekar, Hector Garcia-Molina, Aditya Parameswaran, and Christopher Ré. 2017. SLiMFast: Guaranteed Results for Data Fusion and Source Reliability. In Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD '17). Association for Computing Machinery, New York, NY, USA, 1399â1414. https://doi.org/10.1145/3035918.3035951Google ScholarDigital Library
Alieh Saeedi, Eric Peukert, and Erhard Rahm. 2017. Comparative evaluation of distributed clustering schemes for multi-source entity resolution. In European Conference on Advances in Databases and Information Systems. Springer, 278--293.Google ScholarCross Ref
Masoud Salehpour and Joseph G. Davis. 2020. The Effects of Different JSON Representations on Querying Knowledge Graphs. CoRR abs/2004.04286 (2020). arXiv:2004.04286 https://arxiv.org/abs/2004.04286Google Scholar
Timos K. Sellis. 1986. Global Query Optimization. In Proceedings of the 1986 ACM SIGMOD International Conference on Management of Data (Washington, D.C., USA) (SIGMOD '86). Association for Computing Machinery, New York, NY, USA, 191â205. https://doi.org/10.1145/16894.16874Google ScholarDigital Library
Rohit Singh, Vamsi Meduri, Ahmed Elmagarmid, Samuel Madden, Paolo Papotti, Jorge-Arnulfo Quiané-Ruiz, Armando Solar-Lezama, and Nan Tang. 2017. Generating concise entity matching rules. In Proceedings of the 2017 ACM International Conference on Management of Data. 1635--1638.Google ScholarDigital Library
Rebecca C Steorts, Samuel L Ventura, Mauricio Sadinle, and Stephen E Fienberg. 2014. A comparison of blocking methods for record linkage. In International conference on privacy in statistical databases. Springer, 253--268.Google ScholarCross Ref
Michael Stonebraker, Daniel Bruckner, Ihab F Ilyas, George Beskales, Mitch Cherniack, Stanley B Zdonik, Alexander Pagan, and Shan Xu. 2013. Data Curation at Scale: The Data Tamer System.. In Cidr, Vol. 2013.Google Scholar
Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago: A Core of Semantic Knowledge. In Proceedings of the 16th International Conference on World Wide Web (Banff, Alberta, Canada) (WWW '07). 697â706.Google ScholarDigital Library
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 6000â6010.Google ScholarDigital Library
Denny Vrandei and Markus Krötzsch. 2014. Wikidata: A Free Collaborative Knowledgebase. Commun. ACM 57, 10 (Sept. 2014), 78â85. https://doi.org/10.1145/2629489Google ScholarDigital Library
Jiannan Wang, Guoliang Li, Jeffrey Xu Yu, and Jianhua Feng. 2011. Entity matching: How similar is similar. Proceedings of the VLDB Endowment 4, 10 (2011), 622--633.Google ScholarDigital Library
Luyu Wang, Yujia Li, Özlem Aslan, and Oriol Vinyals. 2021. WikiGraphs: A Wikipedia Text - Knowledge Graph Paired Dataset. CoRR abs/2107.09556 (2021). arXiv:2107.09556 https://arxiv.org/abs/2107.09556Google Scholar
Jason Wei and Kai Zou. 2019. Eda: Easy data augmentation techniques for boosting performance on text classification tasks. arXiv preprint arXiv:1901.11196 (2019).Google Scholar
Gerhard Weikum, Xin Luna Dong, Simon Razniewski, and Fabian M. Suchanek. 2021. Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases. Found. Trends Databases 10, 2--4 (2021), 108--490. https://doi.org/10.1561/1900000064Google ScholarDigital Library
Steven Euijong Whang, David Menestrina, Georgia Koutrika, Martin Theobald, and Hector Garcia-Molina. 2009. Entity resolution with iterative blocking. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data. 219--232.Google ScholarDigital Library
Richard Wu, Aoqian Zhang, Ihab Ilyas, and Theodoros Rekatsinas. 2020. Attention-based Learning for Missing Data Imputation in HoloClean. In Proceedings of Machine Learning and Systems, Vol. 2. 307--325.Google Scholar
Ikuya Yamada, Koki Washio, Hiroyuki Shindo, and Yuji Matsumoto. 2019. Global Entity Disambiguation with Pretrained Contextualized Embeddings of Words and Entities. arXiv: Computation and Language (2019).Google Scholar
Yan Yan, Stephen Meyles, Aria Haghighi, and Dan Suciu. 2020. Entity Matching in the Wild: A Consistent and Versatile Framework to Unify Data in Industrial Applications. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 2287--2301.Google ScholarDigital Library
Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. 2015. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7--9, 2015, Conference Track Proceedings.Google Scholar
Hongxia Yang. 2019. Aligraph: A comprehensive graph neural network platform. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 3165--3166.Google ScholarDigital Library
Da Zheng, Xiang Song, Chao Ma, Zeyuan Tan, Zihao Ye, Jin Dong, Hao Xiong, Zheng Zhang, and George Karypis. 2020. DGL-KE: Training Knowledge Graph Embeddings at Scale. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '20). Association for Computing Machinery, New York, NY, USA, 739â748. https://doi.org/10.1145/3397271.3401172Google ScholarDigital Library
Yue Zhuge, Hector Garcia-Molina, Joachim Hammer, and Jennifer Widom. 1995. View maintenance in a warehousing environment. In Proceedings of the 1995 ACM SIGMOD international conference on Management of data. 316--327.Google ScholarDigital Library

Index Terms

Saga: A Platform for Continuous Construction and Serving of Knowledge at Scale
1. Computer systems organization
  1. Architectures
    1. Other architectures
2. Information systems
  1. Data management systems
    1. Information integration
  2. World Wide Web
    1. Web mining
      1. Data extraction and integration
        Surfacing

Recommendations

Knowledge graph construction based on knowledge enhanced word embedding model in manufacturing domain

Manufacturing industry is the foundation of a country’s economic development and prosperity. At present, the data in manufacturing enterprises have the problems of weak correlation and high redundancy, which can be solved effectively by knowledge graph. ...
Read More
SAKA: an intelligent platform for semi-automated knowledge graph construction and application
Abstract
Knowledge graph (KG) technology is extensively utilized in many areas, and many companies offer applications based on KG. Nonetheless, the majority of KG platforms necessitate expertise and tremendous time and effort of users to construct KG ...
Read More
Knowledge graphs: Construction, management and querying
Knowledge Graphs: Construction, Management and Querying
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data
June 2022
2597 pages
ISBN:9781450392495
DOI:10.1145/3514221
General Chair:
Zachary Ives
University of Pennsylvania (USA)
,
Program Chairs:
Angela Bonifati
Lyon 1 University (France)
,
Amr El Abbadi
University of California, Santa Barbara (USA)
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 June 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
entity linking
entity resolution
knowledge graph construction
knowledge graphs
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 497
  Total Downloads
- Downloads (Last 12 months)224
- Downloads (Last 6 weeks)13
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Saga: A Platform for Continuous Construction and Serving of Knowledge at Scale

SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Knowledge graph construction based on knowledge enhanced word embedding model in manufacturing domain

SAKA: an intelligent platform for semi-automated knowledge graph construction and application

Knowledge graphs: Construction, management and querying