ABSTRACT
Over the past years, there has been a resurgence of interest in Datalog due to its superior ability of expressing applications that require recursive computations. However, in addition to expressive power, supporting analytical tasks with ever-increasing volume of data requires high performance and scalability. In this paper, we present DCDatalog, an in-memory Datalog engine specifically designed for modern shared-memory multicore architectures. Our key contribution is a novel system architecture that supports a wide scope of Datalog applications with a light-weight coordination scheme during parallel evaluation. To this end, we propose a dynamic scheduling strategy that can generate the parallel execution plan on-the-fly while reducing concurrent accesses to the shared memory. Experimental results on several large datasets show that our system significantly outperforms existing parallel Datalog engines and also scales well with increasing amount of data.
- Christopher R. Aberger, Susan Tu, Kunle Olukotun, and Christopher Ré. 2016. EmptyHeaded: A Relational Engine for Graph Processing. In SIGMOD. 431--446.Google Scholar
- Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of Databases .Addison-Wesley.Google ScholarDigital Library
- Foto N. Afrati, Vinayak R. Borkar, Michael J. Carey, Neoklis Polyzotis, and Jeffrey D. Ullman. 2011. Map-reduce extensions and recursive queries. In EDBT. 1--8.Google Scholar
- Foto N. Afrati and Jeffrey D. Ullman. 2012. Transitive closure and recursive Datalog implemented on clusters. In EDBT. 132--143.Google Scholar
- Raja Appuswamy, Christos Gkantsidis, Dushyanth Narayanan, Orion Hodson, and Antony I. T. Rowstron. 2013. Scale-up vs scale-out for Hadoop: time to rethink?. In SOCC. 20:1--20:13.Google Scholar
- Molham Aref, Balder ten Cate, Todd J. Green, Benny Kimelfeld, Dan Olteanu, Emir Pasalic, Todd L. Veldhuizen, and Geoffrey Washburn. 2015. Design and Implementation of the LogicBlox System. In SIGMOD. 1371--1382.Google Scholar
- Faiz Arni, KayLiang Ong, Shalom Tsur, Haixun Wang, and Carlo Zaniolo. 2003. The Deductive Database System LDLGoogle Scholar
- . TPLP, Vol. 3, 1 (2003), 61--94.Google Scholar
- Luigi Bellomarini, Emanuel Sallinger, and Georg Gottlob. 2018. The Vadalog System: Datalog-based Reasoning for Knowledge Graphs. PVLDB, Vol. 11, 9 (2018), 975--987.Google Scholar
- Robert B Cooper. 1981. Queueing theory. In Proceedings of the ACM'81 conference. 119--122.Google ScholarDigital Library
- Henggang Cui, James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu Kumar, Jinliang Wei, Wei Dai, Gregory R. Ganger, Phillip B. Gibbons, Garth A. Gibson, and Eric P. Xing. 2014. Exploiting Bounded Staleness to Speed Up Big Data Analytics. In USENIX ATC. 37--48.Google ScholarDigital Library
- Adnan Darwiche. 2020. Three Modern Roles for Logic in AI. In PODS. 229--243.Google Scholar
- Ariyam Das, Youfu Li, Jin Wang, Mingda Li, and Carlo Zaniolo. 2019. BigData Applications from Graph Analytics to Machine Learning by Aggregates in Recursion. In ICLP. 273--279.Google Scholar
- Ariyam Das and Carlo Zaniolo. 2019. A Case for Stale Synchronous Distributed Model for Declarative Recursive Computation. TPLP, Vol. 19, 5--6 (2019), 1056--1072.Google Scholar
- Wenfei Fan, Ping Lu, Xiaojian Luo, Jingbo Xu, Qiang Yin, Wenyuan Yu, and Ruiqi Xu. 2018. Adaptive Asynchronous Parallelization of Graph Algorithms. In SIGMOD. 1141--1156.Google Scholar
- Zhiwei Fan, Jianqiao Zhu, Zuyu Zhang, Aws Albarghouthi, Paraschos Koutris, and Jignesh M. Patel. 2019. Scaling-Up In-Memory Datalog Processing: Observations and Techniques. PVLDB, Vol. 12, 6 (2019), 695--708.Google ScholarDigital Library
- Filippo Furfaro, Sergio Greco, Sumit Ganguly, and Carlo Zaniolo. 2002. Pushing extrema aggregates to optimize logic queries. Inf. Syst., Vol. 27, 5 (2002), 321--343.Google ScholarDigital Library
- Sumit Ganguly, Sergio Greco, and Carlo Zaniolo. 1991. Minimum and Maximum Predicates in Logic Programming. In PODS. 154--163.Google Scholar
- Sumit Ganguly, Sergio Greco, and Carlo Zaniolo. 1995. Extrema Predicates in Deductive Databases. J. Comput. Syst. Sci., Vol. 51, 2 (1995), 244--259.Google ScholarDigital Library
- Sumit Ganguly, Abraham Silberschatz, and Shalom Tsur. 1990. A Framework for the Parallel Processing of Datalog Queries. In SIGMOD. 143--152.Google Scholar
- Sumit Ganguly, Abraham Silberschatz, and Shalom Tsur. 1992. Parallel Bottom-Up Processing of Datalog Queries. J. Log. Program., Vol. 14, 1&2 (1992), 101--126.Google ScholarDigital Library
- Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. In OSDI. 17--30.Google Scholar
- Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. 2014. GraphX: Graph Processing in a Distributed Dataflow Framework. In OSDI. 599--613.Google Scholar
- Jiaqi Gu, Yugo Watanabe, William Mazza, Alexander Shkapsky, Mohan Yang, Ling Ding, and Carlo Zaniolo. 2019. RaSQL: Greater Power and Performance for Big Data Analytics with Recursive-aggregate-SQL on Spark. In SIGMOD. 467--484.Google Scholar
- Minyang Han and Khuzaima Daudjee. 2015. Giraph Unchained: Barrierless Asynchronous Parallel Execution in Pregel-like Graph Processing Systems. PVLDB, Vol. 8, 9 (2015), 950--961.Google ScholarDigital Library
- Qirong Ho, James Cipar, Henggang Cui, Seunghak Lee, Jin Kyu Kim, Phillip B. Gibbons, Garth A. Gibson, Gregory R. Ganger, and Eric P. Xing. 2013. More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server. In NIPS. 1223--1231.Google Scholar
- Herbert Jordan, Pavle Subotic, David Zhao, and Bernhard Scholz. 2019. A specialized B-tree for concurrent datalog evaluation. In PPoPP. 327--339.Google Scholar
- J. F. C. Kingman. 1961. The single server queue in heavy traffic. Mathematical Proceedings of the Cambridge Philosophical Society, Vol. 57, 4 (1961), 902--904.Google ScholarCross Ref
- Youfu Li, Jin Wang, Mingda Li, Ariyam Das, Jiaqi Gu, and Carlo Zaniolo. 2021. KDDLog: Performance and Scalability in Knowledge Discovery by Declarative Queries with Aggregates. In ICDE. 1260--1271.Google Scholar
- Boon Thau Loo, Tyson Condie, Minos N. Garofalakis, David E. Gay, Joseph M. Hellerstein, Petros Maniatis, Raghu Ramakrishnan, Timothy Roscoe, and Ion Stoica. 2006. Declarative networking: language, execution and optimization. In SIGMOD. 97--108.Google Scholar
- Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: a system for large-scale graph processing. In SIGMOD. 135--146.Google Scholar
- Mirjana Mazuran, Edoardo Serra, and Carlo Zaniolo. 2013. Extending the power of datalog recursion. VLDB J., Vol. 22, 4 (2013), 471--493.Google ScholarDigital Library
- Boris Motik, Yavor Nenov, Robert Piro, Ian Horrocks, and Dan Olteanu. 2014. Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF Systems. In AAAI. 129--137.Google Scholar
- Inderpal Singh Mumick, Hamid Pirahesh, and Raghu Ramakrishnan. 1990. The Magic of Duplicates and Aggregates. In VLDB. 264--277.Google Scholar
- Jignesh M. Patel, Harshad Deshmukh, Jianqiao Zhu, Navneet Potti, Zuyu Zhang, Marc Spehlmann, Hakan Memisoglu, and Saket Saurabh. 2018. Quickstep: A Data Platform Based on the Scaling-Up Approach. PVLDB, Vol. 11, 6 (2018), 663--676.Google Scholar
- Kenneth A. Ross and Yehoshua Sagiv. 1992. Monotonic Aggregation in Deductive Databases. In PODS. 114--126.Google Scholar
- Leonid Ryzhyk and Mihai Budiu. 2019. Differential Datalog. In LPNMR, Vol. 2368. 56--67.Google Scholar
- Bernhard Scholz, Herbert Jordan, Pavle Subotic, and Till Westmann. 2016. On fast large-scale program analysis in Datalog. In CC. 196--206.Google Scholar
- Jü rgen Seib and Georg Lausen. 1991. Parallelizing Datalog Programs by Generalized Pivoting. In PODS. 241--251.Google Scholar
- Jiwon Seo, Stephen Guo, and Monica S. Lam. 2013a. SociaLite: Datalog extensions for efficient social network analysis. In ICDE. 278--289.Google Scholar
- Jiwon Seo, Jongsoo Park, Jaeho Shin, and Monica S. Lam. 2013b. Distributed SociaLite: A Datalog-Based Language for Large-Scale Graph Analysis. PVLDB, Vol. 6, 14 (2013), 1906--1917.Google Scholar
- Marianne Shaw, Paraschos Koutris, Bill Howe, and Dan Suciu. 2012. Optimizing Large-Scale Semi-Na"i ve Datalog Evaluation in Hadoop. In Datalog in Academia and Industry. 165--176.Google Scholar
- Alexander Shkapsky, Mohan Yang, Matteo Interlandi, Hsuan Chiu, Tyson Condie, and Carlo Zaniolo. 2016. Big Data Analytics with Datalog Queries on Spark. In SIGMOD. 1135--1149.Google Scholar
- Alexander Shkapsky, Mohan Yang, and Carlo Zaniolo. 2015. Optimizing recursive queries with monotonic aggregates in DeALS. In ICDE. 867--878.Google Scholar
- Yuanyuan Tian, Andrey Balmin, Severin Andreas Corsten, Shirish Tatikonda, and John McPherson. 2013. From "Think Like a Vertex" to "Think Like a Graph". PVLDB, Vol. 7, 3 (2013), 193--204.Google ScholarDigital Library
- Jingjing Wang, Magdalena Balazinska, and Daniel Halperin. 2015. Asynchronous and Fault-Tolerant Recursive Datalog Evaluation in Shared-Nothing Engines. PVLDB, Vol. 8, 12 (2015), 1542--1553.Google ScholarDigital Library
- Jin Wang, Jiacheng Wu, Mingda Li, Jiaqi Gu, Ariyam Das, and Carlo Zaniolo. 2021. Formal semantics and high performance in declarative machine learning using Datalog. VLDB J., Vol. 30, 5 (2021), 859--881.Google ScholarDigital Library
- Jin Wang, Guorui Xiao, Jiaqi Gu, Jiacheng Wu, and Carlo Zaniolo. 2020 a. RASQL: A Powerful Language and its System for Big Data Applications. In SIGMOD. 2673--2676.Google Scholar
- Qiange Wang, Yanfeng Zhang, Hao Wang, Liang Geng, Rubao Lee, Xiaodong Zhang, and Ge Yu. 2020 b. Automating Incremental and Asynchronous Evaluation for Recursive Aggregate Data Processing. In SIGMOD. 2439--2454.Google Scholar
- Ouri Wolfson and Abraham Silberschatz. 1988. Distributed Processing of Logic Programs. In SIGMOD. 329--336.Google Scholar
- Chenning Xie, Rong Chen, Haibing Guan, Binyu Zang, and Haibo Chen. 2015. SYNC or ASYNC: time to fuse for distributed graph-parallel computation. In PPoPP. 194--204.Google Scholar
- Mohan Yang, Alexander Shkapsky, and Carlo Zaniolo. 2017. Scaling up the performance of more powerful Datalog systems on multicore machines. VLDB J., Vol. 26, 2 (2017), 229--248.Google ScholarDigital Library
- Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauly, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In NSDI. 15--28.Google Scholar
- Carlo Zaniolo, Ariyam Das, Jiaqi Gu, Youfu Li, Mingda Li, and Jin Wang. 2019. Monotonic Properties of Completed Aggregates in Recursive Queries. CoRR, Vol. abs/1910.08888 (2019).Google Scholar
- Carlo Zaniolo, Mohan Yang, Ariyam Das, Alexander Shkapsky, Tyson Condie, and Matteo Interlandi. 2017. Fixpoint semantics and optimization of recursive Datalog programs with aggregates. TPLP, Vol. 17, 5--6 (2017), 1048--1065.Google ScholarCross Ref
- Qizhen Zhang, Akash Acharya, Hongzhi Chen, Simran Arora, Ang Chen, Vincent Liu, and Boon Thau Loo. 2019. Optimizing Declarative Graph Queries at Large Scale. In SIGMOD. 1411--1428.Google Scholar
Index Terms
- Optimizing Parallel Recursive Datalog Evaluation on Multicore Machines
Recommendations
Decidable containment of recursive queries
Database theoryOne of the most important reasoning tasks on queries is checking containment, i.e., verifying whether one query yields necessarily a subset of the result of another one. Query containment is crucial in several contexts, such as query optimization, query ...
Static analysis in datalog extensions
We consider the problems of containment, equivalence, satisfiability and query-reachability for datalog programs with negation. These problems are important for optimizing datalog programs. We show that both query-reachability and satisfiability are ...
Abstract Hilbertian deductive systems, infon logic, and Datalog
In the first part of the paper, we discuss abstract Hilbertian deductive systems; these are systems defined by abstract notions of formula, axiom, and inference rule. We use these systems to develop a general method for converting derivability problems, ...
Comments