Skip to main content

Hadoop-Based Big Data Distributions: A Comparative Study

  • Conference paper
  • First Online:
Emerging Trends in Intelligent Systems & Network Security (NISS 2022)

Abstract

Approximately 2.5 quintillion bytes of various forms (structured, semi- structured, or unstructured) of data are generated every day. Indeed, big data technology has come to solve the limitations of traditional methods, which can no longer handle and process large amounts of data in various forms. Hadoop is an open-source big data solution created to store, process, and manage a huge volume of different types of data. Many companies developed their own Hadoop distributions based on the Hadoop ecosystem in the last decade. This paper presents the most popular Hadoop distributions, including MapR, Hortonworks, Cloudera, IBM InfoSphere BigInsights, Amazon Elastic MapReduce, Azure HDInsights, Pivotal HD, and Qubole. Then it provides readers with a deep, detailed comparison of these distributions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Gupta, Y.K., Kumari, S.: A study of big data analytics using apache spark with python and scala. In: 3rd International Conference on Intelligent Sustainable Systems (ICISS), pp. 471–478 (2020)

    Google Scholar 

  2. Williams, L.: Data DNA and diamonds. Eng. Technol. 14(3), 62–65 (2019)

    Article  Google Scholar 

  3. Janev, V.: Semantic intelligence in big data applications. In: Smart Connected World, pp. 71–89. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-76387-9_4

    Chapter  Google Scholar 

  4. Alani, M.M.: Big data in cybersecurity: a survey of applications and future trends. J. Reliable Intell. Environ. 7(2), 85–114 (2021). https://doi.org/10.1007/s40860-020-00120-3

    Article  Google Scholar 

  5. Sarkar, S.: Using qualitative approaches in the era of big data: a confessional tale of a behavioral researcher. J. Inf. Technol. Case Appl. Res. 23(2), 139–144 (2021)

    Google Scholar 

  6. Praveen, S., Chandra, U.: Influence of structured, semi-structured, unstructured data on various data models. Int. J. Sci. Eng. Res. 8(12), 67–69 (2017)

    Google Scholar 

  7. Sumathi, S., Esakkirajan, S.: Fundamentals of Relational Database Management Systems, vol. 47. Springer, Cham (2007)

    Google Scholar 

  8. Sivarajah, U., Kamal, M.M., Irani, Z., Weerakkody, V.: Critical analysis of Big Data challenges and analytical methods. J. Bus. Res. 70, 263–286 (2017)

    Article  Google Scholar 

  9. Tariq, R.S., Nasser, T.: Big data challenges. J. Comput. Eng. Inf. Technol. 04(03) (2015)

    Google Scholar 

  10. Dageville, B., et al.: The snowflake elastic data warehouse. In: Proceedings of the 2016 International Conference on Management of Data, pp. 215–226 (2016)

    Google Scholar 

  11. Bell, F., Chirumamilla, R., Joshi, B.B., Lindstrom, B., Soni, R., Videkar, S.: The snowflake data cloud. In: Snowflake Essentials, Apress, Berkeley, CA, pp. 1–10 (2022)

    Google Scholar 

  12. Nguyen, N., Kim, T.: Toward highly scalable load balancing in kubernetes clusters. IEEE Commun. Mag. 58(7), 78–83 (2020)

    Article  Google Scholar 

  13. Mitchell, I., Locke, M., Wilson, M., Fuller, A.: Fujitsu Services Limited.: Big data: the definitive guide to the revolution in business analytics. Fujitsu Services Ltd, London (2012)

    Google Scholar 

  14. Singh, V.K., Taram, M., Agrawal, V., Baghel, B.S.: A literature review on Hadoop ecosystem and various techniques of big data optimization. Advances in Data and Information Sciences, pp.231–240 (2018)

    Google Scholar 

  15. Belcastro, L., Cantini, R., Marozzo, F., Orsino, A., Talia, D., Trunfio, P.: Programming big data analysis: principles and solutions. J. Big Data 9(1), 1–50 (2022). https://doi.org/10.1186/s40537-021-00555-2

    Article  Google Scholar 

  16. Kaur, M., Goel, M.: Big Data and Hadoop: a review. Communication and Computing Systems, pp. 513–517 (2019)

    Google Scholar 

  17. Singh, A., Rayapati, V.: Learning big data with Amazon elastic MapReduce. Packet Publishing Ltd. (2014)

    Google Scholar 

  18. Webber-Cross, G.: Learning Microsoft Azure. Packet Publishing Ltd. (2014)

    Google Scholar 

  19. Oo, M.N., Thein, T.: Forensic investigation on MapR hadoop platform. In: 2018 1st IEEE International Conference on Knowledge Innovation and Invention (ICKII), pp. 94–97. IEEE (2018)

    Google Scholar 

  20. Menon, R.: Cloudera Administration Handbook. Packet Publishing Ltd. (2014)

    Google Scholar 

  21. Ebbers, M., de Souza, R.G., Lima, M.C., McCullagh, P., Nobles, M., VanStee, D., Waters, B.: Implementing IBM InfoSphere BigInsights on IBM System X. IBM Redbooks (2013)

    Google Scholar 

  22. Achari, S.: Hadoop essentials. Packet Publishing Ltd. (2015)

    Google Scholar 

  23. El Makkaoui, K., Ezzati, A., Beni-Hssane, A., Motamed, C.: Cloud security and privacy model for providing secure cloud services. In: 2016 2nd International Conference on Cloud Computing Technologies and Applications (CloudTech), pp. 81–86. IEEE (2016)

    Google Scholar 

  24. El Makkaoui, K., Beni-Hssane, A., Ezzati, A.: Cloud-elgamal and fast cloud-RSA homomorphic schemes for protecting data confidentiality in cloud computing. Int. J. Digital Crime Forensics (IJDCF) 11(3), 90–102 (2019)

    Article  Google Scholar 

  25. Ouhmad, S., El Makkaoui, K., Beni-Hssane, A., Hajami, A., Ezzati, A.: An electronic nose natural neural learning model in real work environment. IEEE Access 7, 134871–134880 (2019)

    Article  Google Scholar 

  26. El Mrabet, M.A., El Makkaoui, K. and Faize, A. : Supervised machine learning: a survey. In: 2021 4th International Conference on Advanced Communication Technologies and Networking (CommNet), pp. 1-10 (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ikram Hamdaoui .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hamdaoui, I., El Fissaoui, M., El Makkaoui, K., El Allali, Z. (2023). Hadoop-Based Big Data Distributions: A Comparative Study. In: Ben Ahmed, M., Abdelhakim, B.A., Ane, B.K., Rosiyadi, D. (eds) Emerging Trends in Intelligent Systems & Network Security. NISS 2022. Lecture Notes on Data Engineering and Communications Technologies, vol 147. Springer, Cham. https://doi.org/10.1007/978-3-031-15191-0_24

Download citation

Publish with us

Policies and ethics