Optimizing Parallel Recursive Datalog Evaluation on Multicore Machines

Authors:
Jiacheng Wu

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Jin Wang

University of California, Los Angeles, Los Angeles, CA, USA

University of California, Los Angeles, Los Angeles, CA, USA
View Profile

,
Carlo Zaniolo

University of California, Los Angeles, Los Angeles, CA, USA

University of California, Los Angeles, Los Angeles, CA, USA
View Profile

SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataJune 2022Pages 1433–1446https://doi.org/10.1145/3514221.3517853

Published:11 June 2022Publication History

SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data

Pages 1433–1446

ABSTRACT

Over the past years, there has been a resurgence of interest in Datalog due to its superior ability of expressing applications that require recursive computations. However, in addition to expressive power, supporting analytical tasks with ever-increasing volume of data requires high performance and scalability. In this paper, we present DCDatalog, an in-memory Datalog engine specifically designed for modern shared-memory multicore architectures. Our key contribution is a novel system architecture that supports a wide scope of Datalog applications with a light-weight coordination scheme during parallel evaluation. To this end, we propose a dynamic scheduling strategy that can generate the parallel execution plan on-the-fly while reducing concurrent accesses to the shared memory. Experimental results on several large datasets show that our system significantly outperforms existing parallel Datalog engines and also scales well with increasing amount of data.

References

Christopher R. Aberger, Susan Tu, Kunle Olukotun, and Christopher Ré. 2016. EmptyHeaded: A Relational Engine for Graph Processing. In SIGMOD. 431--446.Google Scholar
Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of Databases .Addison-Wesley.Google ScholarDigital Library
Foto N. Afrati, Vinayak R. Borkar, Michael J. Carey, Neoklis Polyzotis, and Jeffrey D. Ullman. 2011. Map-reduce extensions and recursive queries. In EDBT. 1--8.Google Scholar
Foto N. Afrati and Jeffrey D. Ullman. 2012. Transitive closure and recursive Datalog implemented on clusters. In EDBT. 132--143.Google Scholar
Raja Appuswamy, Christos Gkantsidis, Dushyanth Narayanan, Orion Hodson, and Antony I. T. Rowstron. 2013. Scale-up vs scale-out for Hadoop: time to rethink?. In SOCC. 20:1--20:13.Google Scholar
Molham Aref, Balder ten Cate, Todd J. Green, Benny Kimelfeld, Dan Olteanu, Emir Pasalic, Todd L. Veldhuizen, and Geoffrey Washburn. 2015. Design and Implementation of the LogicBlox System. In SIGMOD. 1371--1382.Google Scholar
Faiz Arni, KayLiang Ong, Shalom Tsur, Haixun Wang, and Carlo Zaniolo. 2003. The Deductive Database System LDLGoogle Scholar
. TPLP, Vol. 3, 1 (2003), 61--94.Google Scholar
Luigi Bellomarini, Emanuel Sallinger, and Georg Gottlob. 2018. The Vadalog System: Datalog-based Reasoning for Knowledge Graphs. PVLDB, Vol. 11, 9 (2018), 975--987.Google Scholar
Robert B Cooper. 1981. Queueing theory. In Proceedings of the ACM'81 conference. 119--122.Google ScholarDigital Library
Henggang Cui, James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu Kumar, Jinliang Wei, Wei Dai, Gregory R. Ganger, Phillip B. Gibbons, Garth A. Gibson, and Eric P. Xing. 2014. Exploiting Bounded Staleness to Speed Up Big Data Analytics. In USENIX ATC. 37--48.Google ScholarDigital Library
Adnan Darwiche. 2020. Three Modern Roles for Logic in AI. In PODS. 229--243.Google Scholar
Ariyam Das, Youfu Li, Jin Wang, Mingda Li, and Carlo Zaniolo. 2019. BigData Applications from Graph Analytics to Machine Learning by Aggregates in Recursion. In ICLP. 273--279.Google Scholar
Ariyam Das and Carlo Zaniolo. 2019. A Case for Stale Synchronous Distributed Model for Declarative Recursive Computation. TPLP, Vol. 19, 5--6 (2019), 1056--1072.Google Scholar
Wenfei Fan, Ping Lu, Xiaojian Luo, Jingbo Xu, Qiang Yin, Wenyuan Yu, and Ruiqi Xu. 2018. Adaptive Asynchronous Parallelization of Graph Algorithms. In SIGMOD. 1141--1156.Google Scholar
Zhiwei Fan, Jianqiao Zhu, Zuyu Zhang, Aws Albarghouthi, Paraschos Koutris, and Jignesh M. Patel. 2019. Scaling-Up In-Memory Datalog Processing: Observations and Techniques. PVLDB, Vol. 12, 6 (2019), 695--708.Google ScholarDigital Library
Filippo Furfaro, Sergio Greco, Sumit Ganguly, and Carlo Zaniolo. 2002. Pushing extrema aggregates to optimize logic queries. Inf. Syst., Vol. 27, 5 (2002), 321--343.Google ScholarDigital Library
Sumit Ganguly, Sergio Greco, and Carlo Zaniolo. 1991. Minimum and Maximum Predicates in Logic Programming. In PODS. 154--163.Google Scholar
Sumit Ganguly, Sergio Greco, and Carlo Zaniolo. 1995. Extrema Predicates in Deductive Databases. J. Comput. Syst. Sci., Vol. 51, 2 (1995), 244--259.Google ScholarDigital Library
Sumit Ganguly, Abraham Silberschatz, and Shalom Tsur. 1990. A Framework for the Parallel Processing of Datalog Queries. In SIGMOD. 143--152.Google Scholar
Sumit Ganguly, Abraham Silberschatz, and Shalom Tsur. 1992. Parallel Bottom-Up Processing of Datalog Queries. J. Log. Program., Vol. 14, 1&2 (1992), 101--126.Google ScholarDigital Library
Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. In OSDI. 17--30.Google Scholar
Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. 2014. GraphX: Graph Processing in a Distributed Dataflow Framework. In OSDI. 599--613.Google Scholar
Jiaqi Gu, Yugo Watanabe, William Mazza, Alexander Shkapsky, Mohan Yang, Ling Ding, and Carlo Zaniolo. 2019. RaSQL: Greater Power and Performance for Big Data Analytics with Recursive-aggregate-SQL on Spark. In SIGMOD. 467--484.Google Scholar
Minyang Han and Khuzaima Daudjee. 2015. Giraph Unchained: Barrierless Asynchronous Parallel Execution in Pregel-like Graph Processing Systems. PVLDB, Vol. 8, 9 (2015), 950--961.Google ScholarDigital Library
Qirong Ho, James Cipar, Henggang Cui, Seunghak Lee, Jin Kyu Kim, Phillip B. Gibbons, Garth A. Gibson, Gregory R. Ganger, and Eric P. Xing. 2013. More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server. In NIPS. 1223--1231.Google Scholar
Herbert Jordan, Pavle Subotic, David Zhao, and Bernhard Scholz. 2019. A specialized B-tree for concurrent datalog evaluation. In PPoPP. 327--339.Google Scholar
J. F. C. Kingman. 1961. The single server queue in heavy traffic. Mathematical Proceedings of the Cambridge Philosophical Society, Vol. 57, 4 (1961), 902--904.Google ScholarCross Ref
Youfu Li, Jin Wang, Mingda Li, Ariyam Das, Jiaqi Gu, and Carlo Zaniolo. 2021. KDDLog: Performance and Scalability in Knowledge Discovery by Declarative Queries with Aggregates. In ICDE. 1260--1271.Google Scholar
Boon Thau Loo, Tyson Condie, Minos N. Garofalakis, David E. Gay, Joseph M. Hellerstein, Petros Maniatis, Raghu Ramakrishnan, Timothy Roscoe, and Ion Stoica. 2006. Declarative networking: language, execution and optimization. In SIGMOD. 97--108.Google Scholar
Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: a system for large-scale graph processing. In SIGMOD. 135--146.Google Scholar
Mirjana Mazuran, Edoardo Serra, and Carlo Zaniolo. 2013. Extending the power of datalog recursion. VLDB J., Vol. 22, 4 (2013), 471--493.Google ScholarDigital Library
Boris Motik, Yavor Nenov, Robert Piro, Ian Horrocks, and Dan Olteanu. 2014. Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF Systems. In AAAI. 129--137.Google Scholar
Inderpal Singh Mumick, Hamid Pirahesh, and Raghu Ramakrishnan. 1990. The Magic of Duplicates and Aggregates. In VLDB. 264--277.Google Scholar
Jignesh M. Patel, Harshad Deshmukh, Jianqiao Zhu, Navneet Potti, Zuyu Zhang, Marc Spehlmann, Hakan Memisoglu, and Saket Saurabh. 2018. Quickstep: A Data Platform Based on the Scaling-Up Approach. PVLDB, Vol. 11, 6 (2018), 663--676.Google Scholar
Kenneth A. Ross and Yehoshua Sagiv. 1992. Monotonic Aggregation in Deductive Databases. In PODS. 114--126.Google Scholar
Leonid Ryzhyk and Mihai Budiu. 2019. Differential Datalog. In LPNMR, Vol. 2368. 56--67.Google Scholar
Bernhard Scholz, Herbert Jordan, Pavle Subotic, and Till Westmann. 2016. On fast large-scale program analysis in Datalog. In CC. 196--206.Google Scholar
Jü rgen Seib and Georg Lausen. 1991. Parallelizing Datalog Programs by Generalized Pivoting. In PODS. 241--251.Google Scholar
Jiwon Seo, Stephen Guo, and Monica S. Lam. 2013a. SociaLite: Datalog extensions for efficient social network analysis. In ICDE. 278--289.Google Scholar
Jiwon Seo, Jongsoo Park, Jaeho Shin, and Monica S. Lam. 2013b. Distributed SociaLite: A Datalog-Based Language for Large-Scale Graph Analysis. PVLDB, Vol. 6, 14 (2013), 1906--1917.Google Scholar
Marianne Shaw, Paraschos Koutris, Bill Howe, and Dan Suciu. 2012. Optimizing Large-Scale Semi-Na"i ve Datalog Evaluation in Hadoop. In Datalog in Academia and Industry. 165--176.Google Scholar
Alexander Shkapsky, Mohan Yang, Matteo Interlandi, Hsuan Chiu, Tyson Condie, and Carlo Zaniolo. 2016. Big Data Analytics with Datalog Queries on Spark. In SIGMOD. 1135--1149.Google Scholar
Alexander Shkapsky, Mohan Yang, and Carlo Zaniolo. 2015. Optimizing recursive queries with monotonic aggregates in DeALS. In ICDE. 867--878.Google Scholar
Yuanyuan Tian, Andrey Balmin, Severin Andreas Corsten, Shirish Tatikonda, and John McPherson. 2013. From "Think Like a Vertex" to "Think Like a Graph". PVLDB, Vol. 7, 3 (2013), 193--204.Google ScholarDigital Library
Jingjing Wang, Magdalena Balazinska, and Daniel Halperin. 2015. Asynchronous and Fault-Tolerant Recursive Datalog Evaluation in Shared-Nothing Engines. PVLDB, Vol. 8, 12 (2015), 1542--1553.Google ScholarDigital Library
Jin Wang, Jiacheng Wu, Mingda Li, Jiaqi Gu, Ariyam Das, and Carlo Zaniolo. 2021. Formal semantics and high performance in declarative machine learning using Datalog. VLDB J., Vol. 30, 5 (2021), 859--881.Google ScholarDigital Library
Jin Wang, Guorui Xiao, Jiaqi Gu, Jiacheng Wu, and Carlo Zaniolo. 2020 a. RASQL: A Powerful Language and its System for Big Data Applications. In SIGMOD. 2673--2676.Google Scholar
Qiange Wang, Yanfeng Zhang, Hao Wang, Liang Geng, Rubao Lee, Xiaodong Zhang, and Ge Yu. 2020 b. Automating Incremental and Asynchronous Evaluation for Recursive Aggregate Data Processing. In SIGMOD. 2439--2454.Google Scholar
Ouri Wolfson and Abraham Silberschatz. 1988. Distributed Processing of Logic Programs. In SIGMOD. 329--336.Google Scholar
Chenning Xie, Rong Chen, Haibing Guan, Binyu Zang, and Haibo Chen. 2015. SYNC or ASYNC: time to fuse for distributed graph-parallel computation. In PPoPP. 194--204.Google Scholar
Mohan Yang, Alexander Shkapsky, and Carlo Zaniolo. 2017. Scaling up the performance of more powerful Datalog systems on multicore machines. VLDB J., Vol. 26, 2 (2017), 229--248.Google ScholarDigital Library
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauly, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In NSDI. 15--28.Google Scholar
Carlo Zaniolo, Ariyam Das, Jiaqi Gu, Youfu Li, Mingda Li, and Jin Wang. 2019. Monotonic Properties of Completed Aggregates in Recursive Queries. CoRR, Vol. abs/1910.08888 (2019).Google Scholar
Carlo Zaniolo, Mohan Yang, Ariyam Das, Alexander Shkapsky, Tyson Condie, and Matteo Interlandi. 2017. Fixpoint semantics and optimization of recursive Datalog programs with aggregates. TPLP, Vol. 17, 5--6 (2017), 1048--1065.Google ScholarCross Ref
Qizhen Zhang, Akash Acharya, Hongzhi Chen, Simran Arora, Ang Chen, Vincent Liu, and Boon Thau Loo. 2019. Optimizing Declarative Graph Queries at Large Scale. In SIGMOD. 1411--1428.Google Scholar

Index Terms

Optimizing Parallel Recursive Datalog Evaluation on Multicore Machines
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Parallel and distributed DBMSs
        Relational parallel and distributed DBMSs

Recommendations

Decidable containment of recursive queries
Database theory

One of the most important reasoning tasks on queries is checking containment, i.e., verifying whether one query yields necessarily a subset of the result of another one. Query containment is crucial in several contexts, such as query optimization, query ...
Read More
Static analysis in datalog extensions

We consider the problems of containment, equivalence, satisfiability and query-reachability for datalog programs with negation. These problems are important for optimizing datalog programs. We show that both query-reachability and satisfiability are ...
Read More
Abstract Hilbertian deductive systems, infon logic, and Datalog

In the first part of the paper, we discuss abstract Hilbertian deductive systems; these are systems defined by abstract notions of formula, axiom, and inference rule. We use these systems to develop a general method for converting derivability problems, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data
June 2022
2597 pages
ISBN:9781450392495
DOI:10.1145/3514221
General Chair:
Zachary Ives
University of Pennsylvania (USA)
,
Program Chairs:
Angela Bonifati
Lyon 1 University (France)
,
Amr El Abbadi
University of California, Santa Barbara (USA)
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 June 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
datalog
efficiency
multicore machine
query processing
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 820
  Total Downloads
- Downloads (Last 12 months)353
- Downloads (Last 6 weeks)22
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Optimizing Parallel Recursive Datalog Evaluation on Multicore Machines

SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Decidable containment of recursive queries

Static analysis in datalog extensions

Abstract Hilbertian deductive systems, infon logic, and Datalog