GPU Computations on Hadoop Clusters for Massive Data Processing

Chen, Wenbo; Xu, Shungou; Jiang, Hai; Weng, Tien-Hsiung; Marino, Mario Donato; Chen, Yi-Siang; Li, Kuan-Ching

doi:10.1007/978-3-319-17314-6_66

Wenbo Chen²,
Shungou Xu²,
Hai Jiang³,
Tien-Hsiung Weng⁴,
Mario Donato Marino⁵,
Yi-Siang Chen⁴ &
…
Kuan-Ching Li⁴

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 345))

1736 Accesses
1 Altmetric

Abstract

Hadoop is a well-designed approach for handling massive amount of data. Comprised at the core of the Hadoop File System and MapReduce, it schedules the processing by orchestrating the distributed servers, providing redundancy and fault tolerance. In terms of performance, Hadoop is still behind high performance capacity due to CPUs’ limited parallelism, though. GPU accelerated computing involves the use of a GPU together with a CPU to accelerate applications to data processing on GPU cluster toward higher efficiency. However, GPU cluster has low level data storage capacity. In this chapter, we exploit the hybrid model of GPU and Hadoop to make best use of both capabilities, and the design and implementation of application using Hadoop and CUDA is presented through two interfaces: Hadoop Streaming and Hadoop Pipes. Experimental results on K-means algorithm are presented as well as their performance results are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chen, Y., et al.: MGMR: Multi-GPU based MapReduce. In: Park, J.J., Arabnia, H.R., Kim, C., Shi, W., Gil, J.-M. (eds.) Grid and Pervasive Computing, pp. 433–442. Springer, Heidelberg (2013)
Chapter Google Scholar
Jiang, H., et al.: Scaling up MapReduce-based big data processing on multi-GPU systems. Clust. Comput. 18(1), 369–383 (2015)
Article Google Scholar
Chen, Y. et al.: Pipelined multi-GPU MapReduce for big-data processing. In: Lee, R. (ed.) Computer and Information Science, pp. 231–246. Springer, New York (2013)
Google Scholar
Fang, W., et al.: Mars: accelerating MapReduce with graphics processors. IEEE Trans. Parallel Distrib. Syst. 22(4), 608–620 (2011)
Article Google Scholar
Fan, W. et al.: Parallelization of RSA algorithm based on compute unified device architecture. In: Proceedings of the 9th International Conference on Grid and Cooperative Computing (GCC), IEEE, (2010)
Google Scholar
Tsiomenko, R., Rees, B.S.: Accelerating Fast Fourier Transforms Using Hadoop and CUDA. (2013)
Google Scholar
Zhu, J., et al.: Embedding GPU computations in Hadoop. Int. J. Netw. Distrib. Comput. 2(4), 211–220 (2014)
Article Google Scholar
Ding, M. et al.: More convenient more overhead: the performance evaluation of Hadoop streaming. In: Proceedings of the 2011 ACM Symposium on Research in Applied Computation. ACM, New York, pp. 307–313 (2011)
Google Scholar
Kirk, D.: NVIDIA CUDA software and GPU parallel computing architecture. In: ISMM, vol. 7, pp. 103–104 (2007)
Google Scholar
Jiang, H., et al.: Accelerating MapReduce framework on multi-GPU systems. Clust. Comput. 17(2), 293–301 (2014)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Information and Technology of Lanzhou University, Lanzhou, China
Wenbo Chen & Shungou Xu
Department of Computer Science, Arkansas State University, Jonesboro, AR, USA
Hai Jiang
Department of Computer Science and Information Engineering, Providence University, Taichung, Taiwan
Tien-Hsiung Weng, Yi-Siang Chen & Kuan-Ching Li
Piazzale Umbria 15, Sanfatucchio, (PG), 06060, Italy
Mario Donato Marino

Authors

Wenbo Chen
View author publications
You can also search for this author in PubMed Google Scholar
Shungou Xu
View author publications
You can also search for this author in PubMed Google Scholar
Hai Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Tien-Hsiung Weng
View author publications
You can also search for this author in PubMed Google Scholar
Mario Donato Marino
View author publications
You can also search for this author in PubMed Google Scholar
Yi-Siang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Kuan-Ching Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tien-Hsiung Weng .

Editor information

Editors and Affiliations

School of Engineering, Mercer University, Macon, Georgia, USA
Jengnan Juang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, W. et al. (2016). GPU Computations on Hadoop Clusters for Massive Data Processing. In: Juang, J. (eds) Proceedings of the 3rd International Conference on Intelligent Technologies and Engineering Systems (ICITES2014). Lecture Notes in Electrical Engineering, vol 345. Springer, Cham. https://doi.org/10.1007/978-3-319-17314-6_66

Download citation

DOI: https://doi.org/10.1007/978-3-319-17314-6_66
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-17313-9
Online ISBN: 978-3-319-17314-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics