Abstract
Approximately 2.5 quintillion bytes of various forms (structured, semi- structured, or unstructured) of data are generated every day. Indeed, big data technology has come to solve the limitations of traditional methods, which can no longer handle and process large amounts of data in various forms. Hadoop is an open-source big data solution created to store, process, and manage a huge volume of different types of data. Many companies developed their own Hadoop distributions based on the Hadoop ecosystem in the last decade. This paper presents the most popular Hadoop distributions, including MapR, Hortonworks, Cloudera, IBM InfoSphere BigInsights, Amazon Elastic MapReduce, Azure HDInsights, Pivotal HD, and Qubole. Then it provides readers with a deep, detailed comparison of these distributions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Gupta, Y.K., Kumari, S.: A study of big data analytics using apache spark with python and scala. In: 3rd International Conference on Intelligent Sustainable Systems (ICISS), pp. 471–478 (2020)
Williams, L.: Data DNA and diamonds. Eng. Technol. 14(3), 62–65 (2019)
Janev, V.: Semantic intelligence in big data applications. In: Smart Connected World, pp. 71–89. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-76387-9_4
Alani, M.M.: Big data in cybersecurity: a survey of applications and future trends. J. Reliable Intell. Environ. 7(2), 85–114 (2021). https://doi.org/10.1007/s40860-020-00120-3
Sarkar, S.: Using qualitative approaches in the era of big data: a confessional tale of a behavioral researcher. J. Inf. Technol. Case Appl. Res. 23(2), 139–144 (2021)
Praveen, S., Chandra, U.: Influence of structured, semi-structured, unstructured data on various data models. Int. J. Sci. Eng. Res. 8(12), 67–69 (2017)
Sumathi, S., Esakkirajan, S.: Fundamentals of Relational Database Management Systems, vol. 47. Springer, Cham (2007)
Sivarajah, U., Kamal, M.M., Irani, Z., Weerakkody, V.: Critical analysis of Big Data challenges and analytical methods. J. Bus. Res. 70, 263–286 (2017)
Tariq, R.S., Nasser, T.: Big data challenges. J. Comput. Eng. Inf. Technol. 04(03) (2015)
Dageville, B., et al.: The snowflake elastic data warehouse. In: Proceedings of the 2016 International Conference on Management of Data, pp. 215–226 (2016)
Bell, F., Chirumamilla, R., Joshi, B.B., Lindstrom, B., Soni, R., Videkar, S.: The snowflake data cloud. In: Snowflake Essentials, Apress, Berkeley, CA, pp. 1–10 (2022)
Nguyen, N., Kim, T.: Toward highly scalable load balancing in kubernetes clusters. IEEE Commun. Mag. 58(7), 78–83 (2020)
Mitchell, I., Locke, M., Wilson, M., Fuller, A.: Fujitsu Services Limited.: Big data: the definitive guide to the revolution in business analytics. Fujitsu Services Ltd, London (2012)
Singh, V.K., Taram, M., Agrawal, V., Baghel, B.S.: A literature review on Hadoop ecosystem and various techniques of big data optimization. Advances in Data and Information Sciences, pp.231–240 (2018)
Belcastro, L., Cantini, R., Marozzo, F., Orsino, A., Talia, D., Trunfio, P.: Programming big data analysis: principles and solutions. J. Big Data 9(1), 1–50 (2022). https://doi.org/10.1186/s40537-021-00555-2
Kaur, M., Goel, M.: Big Data and Hadoop: a review. Communication and Computing Systems, pp. 513–517 (2019)
Singh, A., Rayapati, V.: Learning big data with Amazon elastic MapReduce. Packet Publishing Ltd. (2014)
Webber-Cross, G.: Learning Microsoft Azure. Packet Publishing Ltd. (2014)
Oo, M.N., Thein, T.: Forensic investigation on MapR hadoop platform. In: 2018 1st IEEE International Conference on Knowledge Innovation and Invention (ICKII), pp. 94–97. IEEE (2018)
Menon, R.: Cloudera Administration Handbook. Packet Publishing Ltd. (2014)
Ebbers, M., de Souza, R.G., Lima, M.C., McCullagh, P., Nobles, M., VanStee, D., Waters, B.: Implementing IBM InfoSphere BigInsights on IBM System X. IBM Redbooks (2013)
Achari, S.: Hadoop essentials. Packet Publishing Ltd. (2015)
El Makkaoui, K., Ezzati, A., Beni-Hssane, A., Motamed, C.: Cloud security and privacy model for providing secure cloud services. In: 2016 2nd International Conference on Cloud Computing Technologies and Applications (CloudTech), pp. 81–86. IEEE (2016)
El Makkaoui, K., Beni-Hssane, A., Ezzati, A.: Cloud-elgamal and fast cloud-RSA homomorphic schemes for protecting data confidentiality in cloud computing. Int. J. Digital Crime Forensics (IJDCF) 11(3), 90–102 (2019)
Ouhmad, S., El Makkaoui, K., Beni-Hssane, A., Hajami, A., Ezzati, A.: An electronic nose natural neural learning model in real work environment. IEEE Access 7, 134871–134880 (2019)
El Mrabet, M.A., El Makkaoui, K. and Faize, A. : Supervised machine learning: a survey. In: 2021 4th International Conference on Advanced Communication Technologies and Networking (CommNet), pp. 1-10 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hamdaoui, I., El Fissaoui, M., El Makkaoui, K., El Allali, Z. (2023). Hadoop-Based Big Data Distributions: A Comparative Study. In: Ben Ahmed, M., Abdelhakim, B.A., Ane, B.K., Rosiyadi, D. (eds) Emerging Trends in Intelligent Systems & Network Security. NISS 2022. Lecture Notes on Data Engineering and Communications Technologies, vol 147. Springer, Cham. https://doi.org/10.1007/978-3-031-15191-0_24
Download citation
DOI: https://doi.org/10.1007/978-3-031-15191-0_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15190-3
Online ISBN: 978-3-031-15191-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)