Skip to main content

Abstract

Hadoop is a well-designed approach for handling massive amount of data. Comprised at the core of the Hadoop File System and MapReduce, it schedules the processing by orchestrating the distributed servers, providing redundancy and fault tolerance. In terms of performance, Hadoop is still behind high performance capacity due to CPUs’ limited parallelism, though. GPU accelerated computing involves the use of a GPU together with a CPU to accelerate applications to data processing on GPU cluster toward higher efficiency. However, GPU cluster has low level data storage capacity. In this chapter, we exploit the hybrid model of GPU and Hadoop to make best use of both capabilities, and the design and implementation of application using Hadoop and CUDA is presented through two interfaces: Hadoop Streaming and Hadoop Pipes. Experimental results on K-means algorithm are presented as well as their performance results are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chen, Y., et al.: MGMR: Multi-GPU based MapReduce. In: Park, J.J., Arabnia, H.R., Kim, C., Shi, W., Gil, J.-M. (eds.) Grid and Pervasive Computing, pp. 433–442. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  2. Jiang, H., et al.: Scaling up MapReduce-based big data processing on multi-GPU systems. Clust. Comput. 18(1), 369–383 (2015)

    Article  Google Scholar 

  3. Chen, Y. et al.: Pipelined multi-GPU MapReduce for big-data processing. In: Lee, R. (ed.) Computer and Information Science, pp. 231–246. Springer, New York (2013)

    Google Scholar 

  4. Fang, W., et al.: Mars: accelerating MapReduce with graphics processors. IEEE Trans. Parallel Distrib. Syst. 22(4), 608–620 (2011)

    Article  Google Scholar 

  5. Fan, W. et al.: Parallelization of RSA algorithm based on compute unified device architecture. In: Proceedings of the 9th International Conference on Grid and Cooperative Computing (GCC), IEEE, (2010)

    Google Scholar 

  6. Tsiomenko, R., Rees, B.S.: Accelerating Fast Fourier Transforms Using Hadoop and CUDA. (2013)

    Google Scholar 

  7. Zhu, J., et al.: Embedding GPU computations in Hadoop. Int. J. Netw. Distrib. Comput. 2(4), 211–220 (2014)

    Article  Google Scholar 

  8. Ding, M. et al.: More convenient more overhead: the performance evaluation of Hadoop streaming. In: Proceedings of the 2011 ACM Symposium on Research in Applied Computation. ACM, New York, pp. 307–313 (2011)

    Google Scholar 

  9. Kirk, D.: NVIDIA CUDA software and GPU parallel computing architecture. In: ISMM, vol. 7, pp. 103–104 (2007)

    Google Scholar 

  10. Jiang, H., et al.: Accelerating MapReduce framework on multi-GPU systems. Clust. Comput. 17(2), 293–301 (2014)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tien-Hsiung Weng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Chen, W. et al. (2016). GPU Computations on Hadoop Clusters for Massive Data Processing. In: Juang, J. (eds) Proceedings of the 3rd International Conference on Intelligent Technologies and Engineering Systems (ICITES2014). Lecture Notes in Electrical Engineering, vol 345. Springer, Cham. https://doi.org/10.1007/978-3-319-17314-6_66

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-17314-6_66

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-17313-9

  • Online ISBN: 978-3-319-17314-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics