Abstract
Hadoop is a well-designed approach for handling massive amount of data. Comprised at the core of the Hadoop File System and MapReduce, it schedules the processing by orchestrating the distributed servers, providing redundancy and fault tolerance. In terms of performance, Hadoop is still behind high performance capacity due to CPUs’ limited parallelism, though. GPU accelerated computing involves the use of a GPU together with a CPU to accelerate applications to data processing on GPU cluster toward higher efficiency. However, GPU cluster has low level data storage capacity. In this chapter, we exploit the hybrid model of GPU and Hadoop to make best use of both capabilities, and the design and implementation of application using Hadoop and CUDA is presented through two interfaces: Hadoop Streaming and Hadoop Pipes. Experimental results on K-means algorithm are presented as well as their performance results are discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chen, Y., et al.: MGMR: Multi-GPU based MapReduce. In: Park, J.J., Arabnia, H.R., Kim, C., Shi, W., Gil, J.-M. (eds.) Grid and Pervasive Computing, pp. 433–442. Springer, Heidelberg (2013)
Jiang, H., et al.: Scaling up MapReduce-based big data processing on multi-GPU systems. Clust. Comput. 18(1), 369–383 (2015)
Chen, Y. et al.: Pipelined multi-GPU MapReduce for big-data processing. In: Lee, R. (ed.) Computer and Information Science, pp. 231–246. Springer, New York (2013)
Fang, W., et al.: Mars: accelerating MapReduce with graphics processors. IEEE Trans. Parallel Distrib. Syst. 22(4), 608–620 (2011)
Fan, W. et al.: Parallelization of RSA algorithm based on compute unified device architecture. In: Proceedings of the 9th International Conference on Grid and Cooperative Computing (GCC), IEEE, (2010)
Tsiomenko, R., Rees, B.S.: Accelerating Fast Fourier Transforms Using Hadoop and CUDA. (2013)
Zhu, J., et al.: Embedding GPU computations in Hadoop. Int. J. Netw. Distrib. Comput. 2(4), 211–220 (2014)
Ding, M. et al.: More convenient more overhead: the performance evaluation of Hadoop streaming. In: Proceedings of the 2011 ACM Symposium on Research in Applied Computation. ACM, New York, pp. 307–313 (2011)
Kirk, D.: NVIDIA CUDA software and GPU parallel computing architecture. In: ISMM, vol. 7, pp. 103–104 (2007)
Jiang, H., et al.: Accelerating MapReduce framework on multi-GPU systems. Clust. Comput. 17(2), 293–301 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Chen, W. et al. (2016). GPU Computations on Hadoop Clusters for Massive Data Processing. In: Juang, J. (eds) Proceedings of the 3rd International Conference on Intelligent Technologies and Engineering Systems (ICITES2014). Lecture Notes in Electrical Engineering, vol 345. Springer, Cham. https://doi.org/10.1007/978-3-319-17314-6_66
Download citation
DOI: https://doi.org/10.1007/978-3-319-17314-6_66
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-17313-9
Online ISBN: 978-3-319-17314-6
eBook Packages: EngineeringEngineering (R0)