research-article

Distributed join algorithms on thousands of cores

Authors:
Claude Barthels

ETH Zurich

ETH Zurich
View Profile

,
Ingo Müller

ETH Zurich

ETH Zurich
View Profile

,
Timo Schneider

ETH Zurich

ETH Zurich
View Profile

,
Gustavo Alonso

ETH Zurich

ETH Zurich
View Profile

,
Torsten Hoefler

ETH Zurich

ETH Zurich
View Profile

Proceedings of the VLDB Endowment Volume 10 Issue 5pp 517–528https://doi.org/10.14778/3055540.3055545

Published:01 January 2017Publication History

Proceedings of the VLDB Endowment

Abstract

Traditional database operators such as joins are relevant not only in the context of database engines but also as a building block in many computational and machine learning algorithms. With the advent of big data, there is an increasing demand for efficient join algorithms that can scale with the input data size and the available hardware resources.

In this paper, we explore the implementation of distributed join algorithms in systems with several thousand cores connected by a low-latency network as used in high performance computing systems or data centers. We compare radix hash join to sort-merge join algorithms and discuss their implementation at this scale. In the paper, we explain how to use MPI to implement joins, show the impact and advantages of RDMA, discuss the importance of network scheduling, and study the relative performance of sorting vs. hashing. The experimental results show that the algorithms we present scale well with the number of cores, reaching a throughput of 48.7 billion input tuples per second on 4,096 cores.

References

M. Albutiu, A. Kemper, and T. Neumann. Massively parallel sort-merge joins in main memory multi-core database systems. PVLDB, pages 1064--1075, 2012. Google ScholarDigital Library
K. Anikiej. Multi-core parallelization of vectorized query execution. Master's thesis, VU University, 2010.Google Scholar
C. Balkesen, G. Alonso, J. Teubner, and M. T. Özsu. Multi-core, main-memory joins: Sort vs. hash revisited. PVLDB, pages 85--96, 2013. Google ScholarDigital Library
C. Balkesen, J. Teubner, G. Alonso, and M. T. Özsu. Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware. In ICDE, pages 362--373, 2013. Google ScholarDigital Library
C. Balkesen, J. Teubner, G. Alonso, and M. T. Özsu. Main-memory hash joins on modern processor architectures. IEEE TKDE, pages 1754--1766, 2015.Google ScholarCross Ref
C. Barthels, S. Loesing, G. Alonso, and D. Kossmann. Rack-scale in-memory join processing using RDMA. In SIGMOD, pages 1463--1475, 2015. Google ScholarDigital Library
K. E. Batcher. Sorting networks and their applications. In AFIPS, pages 307--314, 1968. Google ScholarDigital Library
C. Binnig, A. Crotty, A. Galakatos, T. Kraska, and E. Zamanian. The end of slow networks: It's time for a redesign. PVLDB, pages 528--539, 2016. Google ScholarDigital Library
S. Blanas, Y. Li, and J. M. Patel. Design and evaluation of main memory hash join algorithms for multi-core CPUs. In SIGMOD, pages 37--48, 2011. Google ScholarDigital Library
A. Costea, A. Ionescu, B. Raducanu, M. Switakowski, C. Barca, J. Sompolski, A. Luszczak, M. Szafranski, G. D. Nijs, and P. Boncz. VectorH: taking SQL-on-Hadoop to the next level. In SIGMOD, pages 1105--1117, 2016. Google ScholarDigital Library
Cray XC Series. http://www.cray.com/products/computing/xc-series/.Google Scholar
CSCS Piz Daint Supercomputer. http://user.cscs.ch/computing_systems/piz_daint/index.html.Google Scholar
D. J. DeWitt, J. F. Naughton, and D. A. Schneider. Parallel sorting on a shared-nothing architecture using probabilistic splitting. In PDIS, pages 280--291, 1991. Google ScholarDigital Library
A. Dragojevic, D. Narayanan, M. Castro, and O. Hodson. Farm: Fast remote memory. In NSDI, pages 401--414, 2014. Google ScholarDigital Library
F. Färber, N. May, W. Lehner, P. Große, I. Müller, H. Rauhe, and J. Dees. The SAP HANA database - an architecture overview. IEEE Data Eng. Bull., 2012.Google Scholar
W. D. Frazer and A. C. McKellar. Samplesort: A sampling approach to minimal storage tree sorting. J. ACM, pages 496--507, 1970. Google ScholarDigital Library
P. W. Frey and G. Alonso. Minimizing the hidden cost of RDMA. In ICDCS, pages 553--560, 2009. Google ScholarDigital Library
P. W. Frey, R. Goncalves, M. L. Kersten, and J. Teubner. A spinning join that does not get dizzy. In ICDCS, pages 283--292, 2010. Google ScholarDigital Library
R. Gerstenberger, M. Besta, and T. Hoefler. Enabling highly-scalable remote memory access programming with MPI-3 one sided. In SC, pages 53:1--53:12, 2013. Google ScholarDigital Library
W. Gropp, T. Hoefler, R. Thakur, and E. Lusk. Using Advanced MPI: Modern Features of the Message-Passing Interface. MIT Press, 2014. Google ScholarDigital Library
T. Hoefler, J. Dinan, R. Thakur, B. Barrett, P. Balaji, W. Gropp, and K. Underwood. Remote Memory Access Programming in MPI-3. ACM TOPC, page 9, 2015. Google ScholarDigital Library
J. Huang and Y.C.Chow. Parallel sorting and data partitioning by sampling. In COMPSAC, 1983.Google Scholar
InfiniBand Trade Association. Architecture specification 1.3, 2015.Google Scholar
J. Jose, H. Subramoni, M. Luo, M. Zhang, J. Huang, M. Wasi-ur-Rahman, N. S. Islam, X. Ouyang, H. Wang, S. Sur, and D. K. Panda. Memcached design on high performance RDMA capable interconnects. In ICPP, pages 743--752, 2011. Google ScholarDigital Library
L. V. Kalé and S. Krishnan. A comparison based parallel sorting algorithm. In ICPP, pages 196--200, 1993. Google ScholarDigital Library
A. Kalia, M. Kaminsky, and D. G. Andersen. Using RDMA efficiently for key-value services. In SIGCOMM, pages 295--306, 2014. Google ScholarDigital Library
C. Kim, E. Sedlar, J. Chhugani, T. Kaldewey, A. D. Nguyen, A. D. Blas, V. W. Lee, N. Satish, and P. Dubey. Sort vs. hash revisited: Fast join implementation on modern multi-core CPUs. PVLDB, pages 1378--1389, 2009. Google ScholarDigital Library
A. Kumar, J. Naughton, J. M. Patel, and X. Zhu. To join or not to join? Thinking twice about joins before feature selection. In SIGMOD, pages 19--34, 2016. Google ScholarDigital Library
F. Li, S. Das, M. Syamala, and V. R. Narasayya. Accelerating relational databases by leveraging remote memory and RDMA. In SIGMOD, pages 355--370, 2016. Google ScholarDigital Library
S. Manegold, P. A. Boncz, and M. L. Kersten. Optimizing main-memory join on modern hardware. IEEE TKDE, pages 709--730, 2002. Google ScholarDigital Library
Message Passing Interface Forum. MPI: a message-passing interface standard, version 3.0, 2012.Google Scholar
O. Polychroniou, R. Sen, and K. A. Ross. Track join: distributed joins with minimal network traffic. In SIGMOD, pages 1483--1494, 2014. Google ScholarDigital Library
W. Rödiger, S. Idicula, A. Kemper, and T. Neumann. Flow-Join: adaptive skew handling for distributed joins over high-speed networks. In ICDE, pages 1194--1205, 2016.Google ScholarCross Ref
W. Rödiger, T. Mühlbauer, A. Kemper, and T. Neumann. High-speed query processing over high-speed networks. PVLDB, pages 228--239, 2015. Google ScholarDigital Library
W. Rödiger, T. Mühlbauer, P. Unterbrunner, A. Reiser, A. Kemper, and T. Neumann. Locality-sensitive operators for parallel main-memory database clusters. In ICDE, pages 592--603, 2014.Google ScholarCross Ref
E. Solomonik and L. V. Kalé. Highly scalable parallel sorting. In IPDPS, pages 1--12, 2010.Google ScholarCross Ref

Index Terms

Distributed join algorithms on thousands of cores

Index terms have been assigned to the content through auto-classification.

Recommendations

Distributed Join Algorithms on Multi-CPU Clusters with GPUDirect RDMA
ICPP '19: Proceedings of the 48th International Conference on Parallel Processing

In data management systems, query processing on GPUs or distributed clusters have proven to be an effective method for high efficiency. However, the high PCIe data transfer overhead between CPUs and GPUs, and the communication cost between nodes in ...
Read More
Fast Equi-Join Algorithms on GPUs: Design and Implementation
SSDBM '17: Proceedings of the 29th International Conference on Scientific and Statistical Database Management

Processing relational joins on modern GPUs has attracted much attention in the past few years. With the rapid development on the hardware and software environment in the GPU world, the existing GPU join algorithms designed for earlier architecture ...
Read More
Distributed stream join query processing with semijoins

This paper addresses the distributed stream processing of window-based multi-way join queries considering the semijoin as a key join operator. In distributed stream processing, data streams arriving at remote sites need to be shipped to the processing ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 10, Issue 5
January 2017
168 pages
ISSN:2150-8097
Editor:
Divesh Srivastava
AT&T Labs
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 January 2017
Published in pvldb Volume 10, Issue 5
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 35
  Total Citations
  View Citations
- 391
  Total Downloads
- Downloads (Last 12 months)28
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Distributed join algorithms on thousands of cores

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Distributed Join Algorithms on Multi-CPU Clusters with GPUDirect RDMA

Fast Equi-Join Algorithms on GPUs: Design and Implementation

Distributed stream join query processing with semijoins

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Distributed join algorithms on thousands of cores

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Distributed Join Algorithms on Multi-CPU Clusters with GPUDirect RDMA

Fast Equi-Join Algorithms on GPUs: Design and Implementation

Distributed stream join query processing with semijoins

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media