skip to main content
research-article

Distributed join algorithms on thousands of cores

Published:01 January 2017Publication History
Skip Abstract Section

Abstract

Traditional database operators such as joins are relevant not only in the context of database engines but also as a building block in many computational and machine learning algorithms. With the advent of big data, there is an increasing demand for efficient join algorithms that can scale with the input data size and the available hardware resources.

In this paper, we explore the implementation of distributed join algorithms in systems with several thousand cores connected by a low-latency network as used in high performance computing systems or data centers. We compare radix hash join to sort-merge join algorithms and discuss their implementation at this scale. In the paper, we explain how to use MPI to implement joins, show the impact and advantages of RDMA, discuss the importance of network scheduling, and study the relative performance of sorting vs. hashing. The experimental results show that the algorithms we present scale well with the number of cores, reaching a throughput of 48.7 billion input tuples per second on 4,096 cores.

References

  1. M. Albutiu, A. Kemper, and T. Neumann. Massively parallel sort-merge joins in main memory multi-core database systems. PVLDB, pages 1064--1075, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. Anikiej. Multi-core parallelization of vectorized query execution. Master's thesis, VU University, 2010.Google ScholarGoogle Scholar
  3. C. Balkesen, G. Alonso, J. Teubner, and M. T. Özsu. Multi-core, main-memory joins: Sort vs. hash revisited. PVLDB, pages 85--96, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. Balkesen, J. Teubner, G. Alonso, and M. T. Özsu. Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware. In ICDE, pages 362--373, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Balkesen, J. Teubner, G. Alonso, and M. T. Özsu. Main-memory hash joins on modern processor architectures. IEEE TKDE, pages 1754--1766, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  6. C. Barthels, S. Loesing, G. Alonso, and D. Kossmann. Rack-scale in-memory join processing using RDMA. In SIGMOD, pages 1463--1475, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. K. E. Batcher. Sorting networks and their applications. In AFIPS, pages 307--314, 1968. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Binnig, A. Crotty, A. Galakatos, T. Kraska, and E. Zamanian. The end of slow networks: It's time for a redesign. PVLDB, pages 528--539, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Blanas, Y. Li, and J. M. Patel. Design and evaluation of main memory hash join algorithms for multi-core CPUs. In SIGMOD, pages 37--48, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Costea, A. Ionescu, B. Raducanu, M. Switakowski, C. Barca, J. Sompolski, A. Luszczak, M. Szafranski, G. D. Nijs, and P. Boncz. VectorH: taking SQL-on-Hadoop to the next level. In SIGMOD, pages 1105--1117, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Cray XC Series. http://www.cray.com/products/computing/xc-series/.Google ScholarGoogle Scholar
  12. CSCS Piz Daint Supercomputer. http://user.cscs.ch/computing_systems/piz_daint/index.html.Google ScholarGoogle Scholar
  13. D. J. DeWitt, J. F. Naughton, and D. A. Schneider. Parallel sorting on a shared-nothing architecture using probabilistic splitting. In PDIS, pages 280--291, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Dragojevic, D. Narayanan, M. Castro, and O. Hodson. Farm: Fast remote memory. In NSDI, pages 401--414, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. F. Färber, N. May, W. Lehner, P. Große, I. Müller, H. Rauhe, and J. Dees. The SAP HANA database - an architecture overview. IEEE Data Eng. Bull., 2012.Google ScholarGoogle Scholar
  16. W. D. Frazer and A. C. McKellar. Samplesort: A sampling approach to minimal storage tree sorting. J. ACM, pages 496--507, 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. W. Frey and G. Alonso. Minimizing the hidden cost of RDMA. In ICDCS, pages 553--560, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. W. Frey, R. Goncalves, M. L. Kersten, and J. Teubner. A spinning join that does not get dizzy. In ICDCS, pages 283--292, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Gerstenberger, M. Besta, and T. Hoefler. Enabling highly-scalable remote memory access programming with MPI-3 one sided. In SC, pages 53:1--53:12, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. W. Gropp, T. Hoefler, R. Thakur, and E. Lusk. Using Advanced MPI: Modern Features of the Message-Passing Interface. MIT Press, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. T. Hoefler, J. Dinan, R. Thakur, B. Barrett, P. Balaji, W. Gropp, and K. Underwood. Remote Memory Access Programming in MPI-3. ACM TOPC, page 9, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Huang and Y.C.Chow. Parallel sorting and data partitioning by sampling. In COMPSAC, 1983.Google ScholarGoogle Scholar
  23. InfiniBand Trade Association. Architecture specification 1.3, 2015.Google ScholarGoogle Scholar
  24. J. Jose, H. Subramoni, M. Luo, M. Zhang, J. Huang, M. Wasi-ur-Rahman, N. S. Islam, X. Ouyang, H. Wang, S. Sur, and D. K. Panda. Memcached design on high performance RDMA capable interconnects. In ICPP, pages 743--752, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. L. V. Kalé and S. Krishnan. A comparison based parallel sorting algorithm. In ICPP, pages 196--200, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Kalia, M. Kaminsky, and D. G. Andersen. Using RDMA efficiently for key-value services. In SIGCOMM, pages 295--306, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. C. Kim, E. Sedlar, J. Chhugani, T. Kaldewey, A. D. Nguyen, A. D. Blas, V. W. Lee, N. Satish, and P. Dubey. Sort vs. hash revisited: Fast join implementation on modern multi-core CPUs. PVLDB, pages 1378--1389, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Kumar, J. Naughton, J. M. Patel, and X. Zhu. To join or not to join? Thinking twice about joins before feature selection. In SIGMOD, pages 19--34, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. F. Li, S. Das, M. Syamala, and V. R. Narasayya. Accelerating relational databases by leveraging remote memory and RDMA. In SIGMOD, pages 355--370, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. Manegold, P. A. Boncz, and M. L. Kersten. Optimizing main-memory join on modern hardware. IEEE TKDE, pages 709--730, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Message Passing Interface Forum. MPI: a message-passing interface standard, version 3.0, 2012.Google ScholarGoogle Scholar
  32. O. Polychroniou, R. Sen, and K. A. Ross. Track join: distributed joins with minimal network traffic. In SIGMOD, pages 1483--1494, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. W. Rödiger, S. Idicula, A. Kemper, and T. Neumann. Flow-Join: adaptive skew handling for distributed joins over high-speed networks. In ICDE, pages 1194--1205, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  34. W. Rödiger, T. Mühlbauer, A. Kemper, and T. Neumann. High-speed query processing over high-speed networks. PVLDB, pages 228--239, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. W. Rödiger, T. Mühlbauer, P. Unterbrunner, A. Reiser, A. Kemper, and T. Neumann. Locality-sensitive operators for parallel main-memory database clusters. In ICDE, pages 592--603, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  36. E. Solomonik and L. V. Kalé. Highly scalable parallel sorting. In IPDPS, pages 1--12, 2010.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Distributed join algorithms on thousands of cores
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image Proceedings of the VLDB Endowment
          Proceedings of the VLDB Endowment  Volume 10, Issue 5
          January 2017
          168 pages
          ISSN:2150-8097
          Issue’s Table of Contents

          Publisher

          VLDB Endowment

          Publication History

          • Published: 1 January 2017
          Published in pvldb Volume 10, Issue 5

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader