research-article

A Study of Main-Memory Hash Joins on Many-core Processor: A Case with Intel Knights Landing Architecture

Authors:
Xuntao Cheng

Nanyang Technological University, Singapore, Singapore

Nanyang Technological University, Singapore, Singapore
View Profile

,
Bingsheng He

National University of Singapore, Singapore, Singapore

National University of Singapore, Singapore, Singapore
View Profile

,
Xiaoli Du

National University of Defense Technology, Changsha, China

National University of Defense Technology, Changsha, China
View Profile

,
Chiew Tong Lau

Nanyang Technological University, Singapore, Singapore

Nanyang Technological University, Singapore, Singapore
View Profile

CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge ManagementNovember 2017Pages 657–666https://doi.org/10.1145/3132847.3132916

Published:06 November 2017Publication History

CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

Pages 657–666

ABSTRACT

Advanced processor architectures have been driving new designs, implementations and optimizations of main-memory hash join algorithms recently. The newly released Intel Xeon Phi many-core processor of the Knights Landing architecture (KNL) embraces interesting hardware features such as many low-frequency out-of-order cores connected on a 2D mesh, and high-bandwidth multi-channel memory (MCDRAM). In this paper, we experimentally revisit the state-of-the-art main-memory hash join algorithms to study how the new hardware features of KNL affect the algorithmic design and tuning as well as to identify the opportunities for further performance improvement on KNL. Our experiments show that, although many existing optimizations are still valid on KNL with proper tuning, even the state-of-the-art algorithms have severely underutilized the memory bandwidth and other hardware resources.

References

Martina-Cezara Albutiu, Alfons Kemper, and Thomas Neumann. 2012. Massively Parallel Sort-merge Joins in Main Memory Multi-core Database Systems. Proc. VLDB Endow., Vol. 5, 10 (2012), 1064--1075. Google ScholarDigital Library
Cagri Balkesen, Gustavo Alonso, Jens Teubner, and M. Tamer Özsu. 2013. Multi-core, Main-memory Joins: Sort vs. Hash Revisited. Proc. VLDB Endow., Vol. 7, 1 (2013), 85--96. Google ScholarDigital Library
Spyros Blanas, Yinan Li, and Jignesh M. Patel. 2011. Design and Evaluation of Main Memory Hash Join Algorithms for Multi-core CPUs Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. ACM, 37--48. Google ScholarDigital Library
Peter A. Boncz, Stefan Manegold, and Martin L. Kersten. 1999. Database Architecture Optimized for the New Bottleneck: Memory Access Proceedings of the 25th International Conference on Very Large Data Bases. Morgan Kaufmann Publishers Inc., 54--65. Google ScholarDigital Library
Shimin Chen, Anastassia Ailamaki, Phillip B. Gibbons, and Todd C. Mowry. 2007. Improving Hash Join Performance Through Prefetching. ACM Trans. Database Syst. Vol. 32, 3 (2007). Google ScholarDigital Library
Xuntao Cheng, Bingsheng He, and Chiew Tong Lau. 2015. Energy-Efficient Query Processing on Embedded CPU-GPU Architectures Proceedings of the 11th International Workshop on Data Management on New Hardware. ACM, 10:1--10:7. Google ScholarDigital Library
Xuntao Cheng, Bingsheng He, Mian Lu, Chiew Tong Lau, Huynh Phung Huynh, and Rick Siow Mong Goh. 2016. Efficient Query Processing on Many-core Architectures: A Case Study with Intel Xeon Phi Processor. In Proceedings of the 2016 International Conference on Management of Data. ACM, 2081--2084. Google ScholarDigital Library
Bingsheng He, Ke Yang, Rui Fang, Mian Lu, Naga Govindaraju, Qiong Luo, and Pedro Sander. 2008. Relational Joins on Graphics Processors. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. ACM, 511--524. Google ScholarDigital Library
Jiong He, Mian Lu, and Bingsheng He. 2013. Revisiting Co-processing for Hash Joins on the Coupled CPU-GPU Architecture. Proc. VLDB Endow., Vol. 6, 10 (2013), 889--900. Google ScholarDigital Library
Kaixi Hou, Hao Wang, and Wu-chun Feng. 2015. ASPaS: A Framework for Automatic SIMDization of Parallel Sorting on x86-based Many-core Processors. In Proceedings of the 29th ACM on International Conference on Supercomputing. ACM, 383--392. Google ScholarDigital Library
James Jeffers and et al.. 2016. Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition. Morgan Kaufmann. Google ScholarDigital Library
Saurabh Jha, Bingsheng He, Mian Lu, Xuntao Cheng, and Huynh Phung Huynh. 2015. Improving Main Memory Hash Joins on Intel Xeon Phi Processors: An Experimental Approach. Proc. VLDB Endow., Vol. 8, 6 (2015), 642--653. Google ScholarDigital Library
Tim Kaldewey, Guy Lohman, Rene Mueller, and Peter Volk. 2012. GPU Join Processing Revisited. In Proceedings of the Eighth International Workshop on Data Management on New Hardware. ACM, 55--62. Google ScholarDigital Library
A. Kemper and T. Neumann. 2011. HyPer: A hybrid OLTP amp;OLAP main memory database system based on virtual memory snapshots 2011 IEEE 27th International Conference on Data Engineering. 195--206. Google ScholarDigital Library
Tim Kiefer, Thomas Kissinger, Benjamin Schlegel, Dirk Habich, Daniel Molka, and Wolfgang Lehner. 2014. ERIS Live: A NUMA-aware In-memory Storage Engine for Tera-scale Multiprocessor Systems Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. ACM, 689--692. Google ScholarDigital Library
Changkyu Kim, Tim Kaldewey, Victor W. Lee, Eric Sedlar, Anthony D. Nguyen, Nadathur Satish, Jatin Chhugani, Andrea Di Blas, and Pradeep Dubey. 2009. Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-core CPUs. Proc. VLDB Endow., Vol. 2, 2 (2009), 1378--1389. Google ScholarDigital Library
Arun Kumar, Jeffrey Naughton, Jignesh M. Patel, and Xiaojin Zhu. 2016. To Join or Not to Join?: Thinking Twice About Joins Before Feature Selection Proceedings of the 2016 International Conference on Management of Data. ACM, 19--34. Google ScholarDigital Library
Viktor Leis, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2014. Morsel-driven Parallelism: A NUMA-aware Query Evaluation Framework for the Many-core Age Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. ACM, 743--754. Google ScholarDigital Library
Yinan Li and Jignesh M. Patel. 2013. BitWeaving: Fast Scans for Main Memory Data Processing Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. ACM, 289--300. Google ScholarDigital Library
Gabriel H. Loh. 2008. 3D-Stacked Memory Architectures for Multi-core Processors Proceedings of the 35th Annual International Symposium on Computer Architecture. IEEE Computer Society, 453--464. Google ScholarDigital Library
G. E. Moore. 2006. Cramming more components onto integrated circuits, Reprinted from Electronics, volume 38, number 8, April 19, 1965, pp.114 ff. IEEE Solid-State Circuits Society Newsletter, Vol. 11, 5 (2006), 33--35.Google Scholar
Holger Pirk, Oscar Moll, Matei Zaharia, and Sam Madden. 2016. Voodoo - a Vector Algebra for Portable Database Performance on Modern Hardware. Proc. VLDB Endow., Vol. 9, 14 (2016), 1707--1718. Google ScholarDigital Library
Orestis Polychroniou, Arun Raghavan, and Kenneth A. Ross. 2015. Rethinking SIMD Vectorization for In-Memory Databases Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 1493--1508. Google ScholarDigital Library
Iraklis Psaroudakis, Tobias Scheuer, Norman May, Abdelkader Sellami, and Anastasia Ailamaki. 2016. Adaptive NUMA-aware Data Placement and Task Scheduling for Analytical Workloads in Main-memory Column-stores. Proc. VLDB Endow., Vol. 10, 2 (2016), 37--48. Google ScholarDigital Library
Nadathur Satish, Changkyu Kim, Jatin Chhugani, Anthony D. Nguyen, Victor W. Lee, Daehyun Kim, and Pradeep Dubey. 2010. Fast Sort on CPUs and GPUs: A Case for Bandwidth Oblivious SIMD Sort Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. ACM, 351--362. Google ScholarDigital Library
Stefan Schuh, Xiao Chen, and Jens Dittrich. 2016. An Experimental Comparison of Thirteen Relational Equi-Joins in Main Memory Proceedings of the 2016 International Conference on Management of Data. ACM, 1961--1976. Google ScholarDigital Library
Avinash Sodani. 2015. Knights landing (KNL): 2nd Generation Intel® Xeon Phi processor Hot Chips. IEEE, 1--24.Google Scholar
Kian-Lee Tan, Qingchao Cai, Beng Chin Ooi, Weng-Fai Wong, Chang Yao, and Hao Zhang. 2015. In-memory Databases: Challenges and Opportunities From Software and Hardware Perspectives. SIGMOD Rec., Vol. 44, 2 (2015), 35--40. Google ScholarDigital Library
Jens Teubner, Gustavo Alonso, Cagri Balkesen, and M. Tamer Ozsu. 2013. Main-memory Hash Joins on Multi-core CPUs: Tuning to the Underlying Hardware Proceedings of the 2013 IEEE International Conference on Data Engineering. IEEE Computer Society, 362--373. Google ScholarDigital Library
H. Zhang, G. Chen, B. C. Ooi, K. L. Tan, and M. Zhang. 2015. In-Memory Big Data Management and Processing: A Survey. IEEE Transactions on Knowledge and Data Engineering, Vol. 27, 7 (2015), 1920--1948.Google ScholarDigital Library

Index Terms

A Study of Main-Memory Hash Joins on Many-core Processor: A Case with Intel Knights Landing Architecture
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
        Join algorithms
        Query operators

Recommendations

Improving main memory hash joins on Intel Xeon Phi processors: an experimental approach

Modern processor technologies have driven new designs and implementations in main-memory hash joins. Recently, Intel Many Integrated Core (MIC) co-processors (commonly known as Xeon Phi) embrace emerging x86 single-chip many-core techniques. Compared ...
Read More
Many-core needs fine-grained scheduling: A case study of query processing on Intel Xeon Phi processors
Abstract
Emerging many-core processors feature very high memory bandwidth and computational power. For example, Intel Xeon Phi many-core processors of the Knights Corner (KNC) and Knights Landing (KNL) architectures embrace 60 to 64 x86-based ...
Highlights
- We find that the state-of-the-art implementations of in-memory database operators suffer severely from memory stalls. Also, such implementations under-...
Read More
Vectorizing Unstructured Mesh Computations for Many-core Architectures
PMAM'14: Proceedings of Programming Models and Applications on Multicores and Manycores

Achieving optimal performance on the latest multi-core and many-core architectures depends more and more on making efficient use of the hardware's vector processing capabilities. While auto-vectorizing compilers do not require the use of vector ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management
November 2017
2604 pages
ISBN:9781450349185
DOI:10.1145/3132847
General Chairs:
Ee-Peng Lim
Singapore Management University, Singapore
,
Marianne Winslett
University of Illinois at Urbana-Champaign, USA, and Advanced Digital Sciences Center, Singapore
,
Program Chairs:
Mark Sanderson
RMIT, Australia
,
Ada Fu
Chinese University of Hong Kong, Hong Kong
,
Jimeng Sun
Georgia Tech, USA
,
Shane Culpepper
RMIT, Australia
,
Eric Lo
Chinese University of Hong Kong, Hong Kong
,
Joyce Ho
Emory University, USA
,
Debora Donato
Mix Tech, Inc., USA
,
Rakesh Agrawal
Data Insights Laboratories, USA
,
Yu Zheng
Microsoft Research Asia, China
,
Carlos Castillo
Qatar Computing Research Institute, Qatar
,
Aixin Sun
Nanyang Technological University, Singapore
,
Vincent S. Tseng
National Cheng Kung University, Taiwan
,
Chenliang Li
Wuhan University, China
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 November 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
database operators
hash join algorithms
many-core processor
Qualifiers
- research-article
Conference

Acceptance Rates
CIKM '17 Paper Acceptance Rate171of855submissions,20%Overall Acceptance Rate1,861of8,427submissions,22%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 16
  Total Citations
  View Citations
- 269
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Study of Main-Memory Hash Joins on Many-core Processor: A Case with Intel Knights Landing Architecture

CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Improving main memory hash joins on Intel Xeon Phi processors: an experimental approach

Many-core needs fine-grained scheduling: A case study of query processing on Intel Xeon Phi processors

Vectorizing Unstructured Mesh Computations for Many-core Architectures