skip to main content
10.1145/3514221.3517853acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Open Access

Optimizing Parallel Recursive Datalog Evaluation on Multicore Machines

Authors Info & Claims
Published:11 June 2022Publication History

ABSTRACT

Over the past years, there has been a resurgence of interest in Datalog due to its superior ability of expressing applications that require recursive computations. However, in addition to expressive power, supporting analytical tasks with ever-increasing volume of data requires high performance and scalability. In this paper, we present DCDatalog, an in-memory Datalog engine specifically designed for modern shared-memory multicore architectures. Our key contribution is a novel system architecture that supports a wide scope of Datalog applications with a light-weight coordination scheme during parallel evaluation. To this end, we propose a dynamic scheduling strategy that can generate the parallel execution plan on-the-fly while reducing concurrent accesses to the shared memory. Experimental results on several large datasets show that our system significantly outperforms existing parallel Datalog engines and also scales well with increasing amount of data.

References

  1. Christopher R. Aberger, Susan Tu, Kunle Olukotun, and Christopher Ré. 2016. EmptyHeaded: A Relational Engine for Graph Processing. In SIGMOD. 431--446.Google ScholarGoogle Scholar
  2. Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of Databases .Addison-Wesley.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Foto N. Afrati, Vinayak R. Borkar, Michael J. Carey, Neoklis Polyzotis, and Jeffrey D. Ullman. 2011. Map-reduce extensions and recursive queries. In EDBT. 1--8.Google ScholarGoogle Scholar
  4. Foto N. Afrati and Jeffrey D. Ullman. 2012. Transitive closure and recursive Datalog implemented on clusters. In EDBT. 132--143.Google ScholarGoogle Scholar
  5. Raja Appuswamy, Christos Gkantsidis, Dushyanth Narayanan, Orion Hodson, and Antony I. T. Rowstron. 2013. Scale-up vs scale-out for Hadoop: time to rethink?. In SOCC. 20:1--20:13.Google ScholarGoogle Scholar
  6. Molham Aref, Balder ten Cate, Todd J. Green, Benny Kimelfeld, Dan Olteanu, Emir Pasalic, Todd L. Veldhuizen, and Geoffrey Washburn. 2015. Design and Implementation of the LogicBlox System. In SIGMOD. 1371--1382.Google ScholarGoogle Scholar
  7. Faiz Arni, KayLiang Ong, Shalom Tsur, Haixun Wang, and Carlo Zaniolo. 2003. The Deductive Database System LDLGoogle ScholarGoogle Scholar
  8. . TPLP, Vol. 3, 1 (2003), 61--94.Google ScholarGoogle Scholar
  9. Luigi Bellomarini, Emanuel Sallinger, and Georg Gottlob. 2018. The Vadalog System: Datalog-based Reasoning for Knowledge Graphs. PVLDB, Vol. 11, 9 (2018), 975--987.Google ScholarGoogle Scholar
  10. Robert B Cooper. 1981. Queueing theory. In Proceedings of the ACM'81 conference. 119--122.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Henggang Cui, James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu Kumar, Jinliang Wei, Wei Dai, Gregory R. Ganger, Phillip B. Gibbons, Garth A. Gibson, and Eric P. Xing. 2014. Exploiting Bounded Staleness to Speed Up Big Data Analytics. In USENIX ATC. 37--48.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Adnan Darwiche. 2020. Three Modern Roles for Logic in AI. In PODS. 229--243.Google ScholarGoogle Scholar
  13. Ariyam Das, Youfu Li, Jin Wang, Mingda Li, and Carlo Zaniolo. 2019. BigData Applications from Graph Analytics to Machine Learning by Aggregates in Recursion. In ICLP. 273--279.Google ScholarGoogle Scholar
  14. Ariyam Das and Carlo Zaniolo. 2019. A Case for Stale Synchronous Distributed Model for Declarative Recursive Computation. TPLP, Vol. 19, 5--6 (2019), 1056--1072.Google ScholarGoogle Scholar
  15. Wenfei Fan, Ping Lu, Xiaojian Luo, Jingbo Xu, Qiang Yin, Wenyuan Yu, and Ruiqi Xu. 2018. Adaptive Asynchronous Parallelization of Graph Algorithms. In SIGMOD. 1141--1156.Google ScholarGoogle Scholar
  16. Zhiwei Fan, Jianqiao Zhu, Zuyu Zhang, Aws Albarghouthi, Paraschos Koutris, and Jignesh M. Patel. 2019. Scaling-Up In-Memory Datalog Processing: Observations and Techniques. PVLDB, Vol. 12, 6 (2019), 695--708.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Filippo Furfaro, Sergio Greco, Sumit Ganguly, and Carlo Zaniolo. 2002. Pushing extrema aggregates to optimize logic queries. Inf. Syst., Vol. 27, 5 (2002), 321--343.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Sumit Ganguly, Sergio Greco, and Carlo Zaniolo. 1991. Minimum and Maximum Predicates in Logic Programming. In PODS. 154--163.Google ScholarGoogle Scholar
  19. Sumit Ganguly, Sergio Greco, and Carlo Zaniolo. 1995. Extrema Predicates in Deductive Databases. J. Comput. Syst. Sci., Vol. 51, 2 (1995), 244--259.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Sumit Ganguly, Abraham Silberschatz, and Shalom Tsur. 1990. A Framework for the Parallel Processing of Datalog Queries. In SIGMOD. 143--152.Google ScholarGoogle Scholar
  21. Sumit Ganguly, Abraham Silberschatz, and Shalom Tsur. 1992. Parallel Bottom-Up Processing of Datalog Queries. J. Log. Program., Vol. 14, 1&2 (1992), 101--126.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. In OSDI. 17--30.Google ScholarGoogle Scholar
  23. Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. 2014. GraphX: Graph Processing in a Distributed Dataflow Framework. In OSDI. 599--613.Google ScholarGoogle Scholar
  24. Jiaqi Gu, Yugo Watanabe, William Mazza, Alexander Shkapsky, Mohan Yang, Ling Ding, and Carlo Zaniolo. 2019. RaSQL: Greater Power and Performance for Big Data Analytics with Recursive-aggregate-SQL on Spark. In SIGMOD. 467--484.Google ScholarGoogle Scholar
  25. Minyang Han and Khuzaima Daudjee. 2015. Giraph Unchained: Barrierless Asynchronous Parallel Execution in Pregel-like Graph Processing Systems. PVLDB, Vol. 8, 9 (2015), 950--961.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Qirong Ho, James Cipar, Henggang Cui, Seunghak Lee, Jin Kyu Kim, Phillip B. Gibbons, Garth A. Gibson, Gregory R. Ganger, and Eric P. Xing. 2013. More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server. In NIPS. 1223--1231.Google ScholarGoogle Scholar
  27. Herbert Jordan, Pavle Subotic, David Zhao, and Bernhard Scholz. 2019. A specialized B-tree for concurrent datalog evaluation. In PPoPP. 327--339.Google ScholarGoogle Scholar
  28. J. F. C. Kingman. 1961. The single server queue in heavy traffic. Mathematical Proceedings of the Cambridge Philosophical Society, Vol. 57, 4 (1961), 902--904.Google ScholarGoogle ScholarCross RefCross Ref
  29. Youfu Li, Jin Wang, Mingda Li, Ariyam Das, Jiaqi Gu, and Carlo Zaniolo. 2021. KDDLog: Performance and Scalability in Knowledge Discovery by Declarative Queries with Aggregates. In ICDE. 1260--1271.Google ScholarGoogle Scholar
  30. Boon Thau Loo, Tyson Condie, Minos N. Garofalakis, David E. Gay, Joseph M. Hellerstein, Petros Maniatis, Raghu Ramakrishnan, Timothy Roscoe, and Ion Stoica. 2006. Declarative networking: language, execution and optimization. In SIGMOD. 97--108.Google ScholarGoogle Scholar
  31. Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: a system for large-scale graph processing. In SIGMOD. 135--146.Google ScholarGoogle Scholar
  32. Mirjana Mazuran, Edoardo Serra, and Carlo Zaniolo. 2013. Extending the power of datalog recursion. VLDB J., Vol. 22, 4 (2013), 471--493.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Boris Motik, Yavor Nenov, Robert Piro, Ian Horrocks, and Dan Olteanu. 2014. Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF Systems. In AAAI. 129--137.Google ScholarGoogle Scholar
  34. Inderpal Singh Mumick, Hamid Pirahesh, and Raghu Ramakrishnan. 1990. The Magic of Duplicates and Aggregates. In VLDB. 264--277.Google ScholarGoogle Scholar
  35. Jignesh M. Patel, Harshad Deshmukh, Jianqiao Zhu, Navneet Potti, Zuyu Zhang, Marc Spehlmann, Hakan Memisoglu, and Saket Saurabh. 2018. Quickstep: A Data Platform Based on the Scaling-Up Approach. PVLDB, Vol. 11, 6 (2018), 663--676.Google ScholarGoogle Scholar
  36. Kenneth A. Ross and Yehoshua Sagiv. 1992. Monotonic Aggregation in Deductive Databases. In PODS. 114--126.Google ScholarGoogle Scholar
  37. Leonid Ryzhyk and Mihai Budiu. 2019. Differential Datalog. In LPNMR, Vol. 2368. 56--67.Google ScholarGoogle Scholar
  38. Bernhard Scholz, Herbert Jordan, Pavle Subotic, and Till Westmann. 2016. On fast large-scale program analysis in Datalog. In CC. 196--206.Google ScholarGoogle Scholar
  39. Jü rgen Seib and Georg Lausen. 1991. Parallelizing Datalog Programs by Generalized Pivoting. In PODS. 241--251.Google ScholarGoogle Scholar
  40. Jiwon Seo, Stephen Guo, and Monica S. Lam. 2013a. SociaLite: Datalog extensions for efficient social network analysis. In ICDE. 278--289.Google ScholarGoogle Scholar
  41. Jiwon Seo, Jongsoo Park, Jaeho Shin, and Monica S. Lam. 2013b. Distributed SociaLite: A Datalog-Based Language for Large-Scale Graph Analysis. PVLDB, Vol. 6, 14 (2013), 1906--1917.Google ScholarGoogle Scholar
  42. Marianne Shaw, Paraschos Koutris, Bill Howe, and Dan Suciu. 2012. Optimizing Large-Scale Semi-Na"i ve Datalog Evaluation in Hadoop. In Datalog in Academia and Industry. 165--176.Google ScholarGoogle Scholar
  43. Alexander Shkapsky, Mohan Yang, Matteo Interlandi, Hsuan Chiu, Tyson Condie, and Carlo Zaniolo. 2016. Big Data Analytics with Datalog Queries on Spark. In SIGMOD. 1135--1149.Google ScholarGoogle Scholar
  44. Alexander Shkapsky, Mohan Yang, and Carlo Zaniolo. 2015. Optimizing recursive queries with monotonic aggregates in DeALS. In ICDE. 867--878.Google ScholarGoogle Scholar
  45. Yuanyuan Tian, Andrey Balmin, Severin Andreas Corsten, Shirish Tatikonda, and John McPherson. 2013. From "Think Like a Vertex" to "Think Like a Graph". PVLDB, Vol. 7, 3 (2013), 193--204.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Jingjing Wang, Magdalena Balazinska, and Daniel Halperin. 2015. Asynchronous and Fault-Tolerant Recursive Datalog Evaluation in Shared-Nothing Engines. PVLDB, Vol. 8, 12 (2015), 1542--1553.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Jin Wang, Jiacheng Wu, Mingda Li, Jiaqi Gu, Ariyam Das, and Carlo Zaniolo. 2021. Formal semantics and high performance in declarative machine learning using Datalog. VLDB J., Vol. 30, 5 (2021), 859--881.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Jin Wang, Guorui Xiao, Jiaqi Gu, Jiacheng Wu, and Carlo Zaniolo. 2020 a. RASQL: A Powerful Language and its System for Big Data Applications. In SIGMOD. 2673--2676.Google ScholarGoogle Scholar
  49. Qiange Wang, Yanfeng Zhang, Hao Wang, Liang Geng, Rubao Lee, Xiaodong Zhang, and Ge Yu. 2020 b. Automating Incremental and Asynchronous Evaluation for Recursive Aggregate Data Processing. In SIGMOD. 2439--2454.Google ScholarGoogle Scholar
  50. Ouri Wolfson and Abraham Silberschatz. 1988. Distributed Processing of Logic Programs. In SIGMOD. 329--336.Google ScholarGoogle Scholar
  51. Chenning Xie, Rong Chen, Haibing Guan, Binyu Zang, and Haibo Chen. 2015. SYNC or ASYNC: time to fuse for distributed graph-parallel computation. In PPoPP. 194--204.Google ScholarGoogle Scholar
  52. Mohan Yang, Alexander Shkapsky, and Carlo Zaniolo. 2017. Scaling up the performance of more powerful Datalog systems on multicore machines. VLDB J., Vol. 26, 2 (2017), 229--248.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauly, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In NSDI. 15--28.Google ScholarGoogle Scholar
  54. Carlo Zaniolo, Ariyam Das, Jiaqi Gu, Youfu Li, Mingda Li, and Jin Wang. 2019. Monotonic Properties of Completed Aggregates in Recursive Queries. CoRR, Vol. abs/1910.08888 (2019).Google ScholarGoogle Scholar
  55. Carlo Zaniolo, Mohan Yang, Ariyam Das, Alexander Shkapsky, Tyson Condie, and Matteo Interlandi. 2017. Fixpoint semantics and optimization of recursive Datalog programs with aggregates. TPLP, Vol. 17, 5--6 (2017), 1048--1065.Google ScholarGoogle ScholarCross RefCross Ref
  56. Qizhen Zhang, Akash Acharya, Hongzhi Chen, Simran Arora, Ang Chen, Vincent Liu, and Boon Thau Loo. 2019. Optimizing Declarative Graph Queries at Large Scale. In SIGMOD. 1411--1428.Google ScholarGoogle Scholar

Index Terms

  1. Optimizing Parallel Recursive Datalog Evaluation on Multicore Machines

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data
      June 2022
      2597 pages
      ISBN:9781450392495
      DOI:10.1145/3514221

      Copyright © 2022 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 June 2022

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate785of4,003submissions,20%
    • Article Metrics

      • Downloads (Last 12 months)353
      • Downloads (Last 6 weeks)22

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader