ABSTRACT
In as much as the approaches of the new revolution, machines including transmission media like social media sites, nowadays quantity of data swell hastily. So, size is the core and only facet that leaps the mention of BIG DATA. In this article, an effort to touch a comprehensive view of big data technologies, because of the swift evolution of data by an industry trying the academic press to catch up. This paper also offers a unified explanation of big data as well as the analytics methods. A practical discriminate characteristic of this paper is core analytics associated with unstructured data which is more than 90% of big data. To deal with complicated Big Data problems, great work has been done. This paper analyzes contemporary Big Data technologies. Therein article further strengthens the necessity to formulate new tools for analytics. It bestows not sole an intercontinental overview of big data techniques even though the valuation according to big data Hadoop Ecosystem. It classifies and debates the main technologies feature, challenges, and usage as well.
- 6th Symposium on Operating Systems Design and Implementation --- Technical Paper: https://www.usenix.org/legacy/event/osdi04/tech/full_papers/dean/dean_html/. Accessed: 2019-08-01.Google Scholar
- Aiyer, A. et al. 2012. Storage Infrastructure Behind Facebook Messages. IEEE Data Engineering. (2012), 1--10.Google Scholar
- Al-fuqaha, A. et al. 2015. Internet of Things: A Survey on Enabling. IEEE Communications Surveys & Tutorials. 17, 4 (2015), 2347--2376. DOI:https://doi.org/10.1109/COMST.2015.2444095.Google ScholarDigital Library
- Al-Sai, Z.A. et al. 2019. Big Data Impacts and Challenges: A Review. 2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology, JEEIT 2019 - Proceedings. (2019), 150--155. DOI:https://doi.org/10.1109/JEEIT.2019.8717484.Google Scholar
- Alam, A. and Ahmed, J. 2014. Hadoop Architecture and Its Issues. (2014). DOI:https://doi.org/10.1109/CSCI.2014.140.Google Scholar
- Ambari -: http://ambari.apache.org/. Accessed: 2019-08-02.Google Scholar
- Apache Cassandra: http://cassandra.apache.org/. Accessed: 2019-08-01.Google Scholar
- Apache HBase - Apache HBase™ Home: http://hbase.apache.org/. Accessed: 2019-07-31.Google Scholar
- Apache Hive TM: http://hive.apache.org/. Accessed: 2019-08-02.Google Scholar
- Apache Mahout: http://mahout.apache.org/. Accessed: 2019-08-02.Google Scholar
- Apache Spark™ - Unified Analytics Engine for Big Data: http://spark.apache.org/. Accessed: 2019-08-02.Google Scholar
- Apache Tez - Welcome to Apache TEZ®: http://tez.apache.org/. Accessed: 2019-08-02.Google Scholar
- Apache ZooKeeper: http://zookeeper.apache.org/. Accessed: 2019-08-02.Google Scholar
- Ardagna, C.A. et al. 2016. Big Data Analytics as-a-Service: Issues and challenges. (2016), 3638--3644.Google Scholar
- Arora, Y. Big Data Technologies: Brief Overview. 131, 9, 1--6.Google Scholar
- Azarmi, B. Scalable Big Data Architecture.Google Scholar
- Balachandran, M. 2017. ScienceDirect ScienceDirect ScienceDirect Challenges Deploying Challenges and and Benefits Benefits of of Deploying Big Data Data Analytics Analytics in in the the Cloud Cloud for for Business Business Intelligence Intelligence Big. Procedia Computer Science. 112, (2017), 1112--1122. DOI:https://doi.org/10.1016/j.procs.2017.08.138.Google Scholar
- Barbier, G. Chapter 12 DATA MINING IN SOCIAL MEDIA. DOI:https://doi.org/10.1007/978-1-4419-8462-3.Google Scholar
- Bardi, M. et al. 1926. Big Data Security and Privacy: A Review. Journal of the Chemical Society (Resumed). 129, 2 (1926), 663--670. DOI:https://doi.org/10.1039/JR9262900663.Google Scholar
- Braganza, A. et al. 2017. Resource management in big data initiatives: Processes and dynamic capabilities *, **. Journal of Business Research. 70, (2017), 328--337. DOI:https://doi.org/10.1016/j.jbusres.2016.08.006.Google Scholar
- Cai, H. et al. 2017. IoT-Based Big Data Storage Systems in Cloud Computing: Perspectives and Challenges. 4, 1 (2017), 75--87.Google Scholar
- Chang, F. et al. 2006. Bigtable: A Distributed Storage System for Structured Data (Awarded Best Paper!). Osdi. (2006), 205--218. DOI:https://doi.org/10.1145/1365815.1365816.Google Scholar
- Chauhan, A. 2013. Learning Cloudera Impala.Google Scholar
- Chukwa - Welcome to Apache Chukwa: http://chukwa.apache.org/. Accessed: 2019-08-02.Google Scholar
- Conference, I.I. et al. 2015. Data Confidentiality Challenges in Big Data Applications. 8, (2015), 2886--2888.Google Scholar
- Dave, M. and Kamal, J. 2017. Identifying Big Data Dimensions and Structure. (2017), 163--168.Google Scholar
- Desai, P. V. 2018. A survey on big data applications and challenges. Proceedings of the International Conference on Inventive Communication and Computational Technologies, ICICCT 2018. Icicct (2018), 737--740. DOI:https://doi.org/10.1109/ICICCT.2018.8472999.Google ScholarCross Ref
- Dimiduk, N. and Khurana, A. HBase in Action.Google Scholar
- Dwivedi, K. 2014. Analytical Review on Hadoop Distributed File System. (2014), 174--181.Google Scholar
- Eldawy, A. and Mokbel, M.F. 2017. The era of Big Spatial Data. Proceedings of the VLDB Endowment. 10, 12 (2017), 1992--1995. DOI:https://doi.org/10.14778/3137765.3137828.Google ScholarDigital Library
- Gandomi, A. and Haider, M. 2015. International Journal of Information Management Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management. 35, 2 (2015), 137--144. DOI:https://doi.org/10.1016/j.ijinfomgt.2014.10.007.Google ScholarDigital Library
- Hep, T. et al. 2019. A Roadmap for HEP Software and Computing R & D for the 2020s. Springer International Publishing.Google Scholar
- Hurwitz, J. et al. 2013. Bir Data for Dummies.Google Scholar
- Industry's Next Generation Data Platform for AI and Analytics | MapR: https://mapr.com/. Accessed: 2019-08-01.Google Scholar
- Ishwarappa and J, A. 2015. A Brief Introduction on Big Data 5Vs Characteristics and Hadoop Technology. 48, Iccc (2015), 319--324. DOI:https://doi.org/10.1016/j.procs.2015.04.188.Google Scholar
- Ismail, A.S. et al. Querying DBpedia Using HIVE-QL. 102--108.Google Scholar
- Jaseena, K.U. and David, J.M. 2014. ISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MINING. (2014), 131--140.Google Scholar
- Khan, N. et al. 1990. Big Data: Survey, Technologies, Opportunities, and Challenges. Japanese Journal of Applied Physics. 29, 8 (1990), L1497--L1499. DOI:https://doi.org/10.1143/JJAP.29.L1497.Google Scholar
- Khan, N. et al. 2018. The 10 Vs, Issues and Challenges of Big Data. March (2018), 52--56. DOI:https://doi.org/10.1145/3206157.3206166.Google Scholar
- Li, S. et al. 2018. US CR. (2018). DOI:https://doi.org/10.1016/j.jii.2018.01.005.Google Scholar
- Lin, J. 2013. MAPREDUCE IS GOOD ENOUGH? March (2013), 28--37. DOI:https://doi.org/10.1089/big.2012.1501.Google Scholar
- Machova, R. et al. 2016. Processing of Big Educational Data in the Cloud Using Apache Hadoop. (2016), 46--49.Google Scholar
- Manwal, M. Big Data and Hadoop -A Technological Survey.Google Scholar
- Martino, B. Di et al. 2014. Big data (lost) in the cloud. International Journal of Big Data Intelligence. 1, 1/2 (2014), 3. DOI:https://doi.org/10.1504/ijbdi.2014.063840.Google ScholarCross Ref
- Mass, C. et al. 2013. Volume 3, Issue 12, December 2013. 3, 12 (2013), 14947.Google Scholar
- Mcafee, A. and Brynjolfsson, E. 2012. Spotlight on Big Data Big Data: The Management Revolution, 2012. Acedido em 15-03-2017. Harvard Business Review. October (2012), 1--9.Google Scholar
- Mehta, N. and Pandit, A. 2018. Concurrence of big data analytics and healthcare: A systematic review. International Journal of Medical Informatics. 114, January (2018), 57--65. DOI:https://doi.org/10.1016/j.ijmedinf.2018.03.013.Google ScholarCross Ref
- Mishra, S. 2015. Challenges in Big Data Application: A Review. 121, 19 (2015), 42--46.Google Scholar
- Mitra, A. et al. 2016. A Novel Big-Data Processing Framwork for Healthcare Applications. (2016), 3548--3555.Google Scholar
- Nambiar, R. 2019. A look at challenges and opportunities of Big Data analytics in healthcare - IEEE Conference Publication. (2019), 17--22.Google Scholar
- Oozie - Apache Oozie Workflow Scheduler for Hadoop: http://oozie.apache.org/. Accessed: 2019-08-03.Google Scholar
- Oussous, A. et al. 2018. Big Data technologies: A survey. Journal of King Saud University - Computer and Information Sciences. 30, 4 (2018), 431--448. DOI:https://doi.org/10.1016/j.jksuci.2017.06.001.Google ScholarCross Ref
- Pashazadeh, A. and Navimipour, N.J. 2018. Big data handling mechanisms in the healthcare applications: A comprehensive and systematic literature review. Journal of Biomedical Informatics.Google ScholarDigital Library
- Patel, D. et al. 2017. Analyzing Network Traffic Data Using Hive Queries. 3 (2017), 3--8.Google Scholar
- Philip Chen, C.L. and Zhang, C.Y. 2014. Data-intensive applications, challenges, techniques and technologies: A survey on Big Data. Information Sciences. 275, (2014), 314--347. DOI:https://doi.org/10.1016/j.ins.2014.01.015.Google Scholar
- Pol, U. 2016. International Journal of Advanced Research in Big Data and Hadoop Technology Solutions with Cloudera Manager. September (2016).Google Scholar
- Prasad, B.R. and Agarwal, S. 2016. Comparative Study of Big Data Computing and Storage Tools: A Review. International Journal of Database Theory and Application. 9, 1 (2016), 45--66. DOI:https://doi.org/10.14257/ijdta.2016.9.1.05.Google ScholarCross Ref
- Rajaraman, V. 2016. Big Data Analytics. August (2016), 2015--2016.Google ScholarCross Ref
- Ravi, V.T. Comparing Map-Reduce and FREERIDE for Data-Intensive Applications.Google Scholar
- Raza, M.U. 2017. Big Data - Security and Privacy policy. 5, 6 (2017), 51--54.Google Scholar
- Rezaeijam, M. A Survey on Security of Hadoop.Google Scholar
- Sakr, S. Big Data 2.0 Processing Systems A Survey.Google Scholar
- Shafer, J. et al. 2010. The Hadoop distributed filesystem: Balancing portability and performance. ISPASS 2010 - IEEE International Symposium on Performance Analysis of Systems and Software. March 2010 (2010), 122--133. DOI:https://doi.org/10.1109/ISPASS.2010.5452045.Google ScholarCross Ref
- Shao, Y. et al. 2018. Computers & Industrial Engineering E ffi cient jobs scheduling approach for big data applications. Computers & Industrial Engineering. 117, March 2017 (2018), 249--261. DOI:https://doi.org/10.1016/j.cie.2018.02.006.Google Scholar
- Sinanc, D. et al. 2015. A survey on security and privacy issues in big data. December (2015). DOI:https://doi.org/10.1109/ICITST.2015.7412089.Google Scholar
- Singh, S. et al. 2015. Big Data: Technologies, Trends and Applications. 6, 5 (2015), 4633--4639.Google Scholar
- Sogodekar, M. et al. 2016. Big data analytics: Hadoop and tools. IEEE Bombay Section Symposium 2016: Frontiers of Technology: Fuelling Prosperity of Planet and People, IBSS 2016. (2016). DOI:https://doi.org/10.1109/IBSS.2016.7940204.Google Scholar
- Somasekaram, P. 2016. Privacy-Preserving Big Data in an In-Memory Analytics Solution. Luleå University of Technology. (2016).Google Scholar
- Sqoop -: http://sqoop.apache.org/. Accessed: 2019-08-03.Google Scholar
- Sur, S. et al. Can High-Performance Interconnects Benefit Hadoop Distributed File System ?Google Scholar
- Taguchi, Y.H. et al. 2014. Heuristic principal component analysis-based unsupervised feature extraction and its application to bioinformatics. Big Data Analytics in Bioinformatics and Healthcare. i, (2014), 138--162. DOI:https://doi.org/10.4018/978-1-4666-6611-5.ch007.Google Scholar
- Tech, M.R.D. 2014. Handling Big Data with Hadoop Toolkit. 978 (2014).Google Scholar
- The real story of how big data analytics helped Obama win | InfoWorld: https://www.infoworld.com/article/2613587/the-real-story-of-how-big-data-analytics-helped-obama-win.html. Accessed: 2019-07-30.Google Scholar
- To, Q.C. et al. 2018. A survey of state management in big data processing systems. VLDB Journal. 27, 6 (2018), 847--872. DOI:https://doi.org/10.1007/s00778-018-0514-9.Google ScholarDigital Library
- Uzunkaya, C. et al. 2015. Hadoop Ecosystem and Its Analysis on Tweets. Procedia - Social and Behavioral Sciences. 195, (2015), 1890--1897. DOI:https://doi.org/10.1016/j.sbspro.2015.06.429.Google Scholar
- Wang, H. et al. 2016. Towards felicitous decision making: An overview on challenges and trends of Big Data. Information Sciences. 367-368, (2016), 747--765. DOI:https://doi.org/10.1016/j.ins.2016.07.007.Google Scholar
- Wang, Y. et al. 2018. Big data analytics: Understanding its capabilities and potential benefits for healthcare organizations. Technological Forecasting and Social Change. 126, (2018). DOI:https://doi.org/10.1016/j.techfore.2015.12.019.Google Scholar
- Welcome to Apache Avro! http://avro.apache.org/. Accessed: 2019-08-02.Google Scholar
- Welcome to Apache Pig! http://pig.apache.org/. Accessed: 2019-08-02.Google Scholar
- White, T. Hadoop: The Definitive Guide.Google Scholar
- Zheng, Z. et al. 2015. Real-Time Big Data Processing Framework: Challenges and Solutions. 3190, 6 (2015), 3169--3190.Google Scholar
- Zhou, J. et al. 2013. Cloud Things: a Common Architecture for Integrating the Internet of Things with Cloud Computing. (2013), 651--657.Google Scholar
Index Terms
- A Comprehensive Overview of BIG DATA Technologies: A Survey
Recommendations
Efficient Batch Processing of Related Big Data Tasks using Persistent MapReduce Technique
VisionNet'16: Proceedings of the Third International Symposium on Computer Vision and the InternetThe data generated by today's enterprises has been increasing at exponential rates in size from most recent couple of years. Also, the need to process and break down the substantial volumes of data has likewise expanded. In order to handle this enormous ...
A Performance Analysis of MapReduce Task with Large Number of Files Dataset in Big Data Using Hadoop
CSNT '14: Proceedings of the 2014 Fourth International Conference on Communication Systems and Network TechnologiesBig Data is a huge amount of data that cannot be managed by the traditional data management system. Hadoop is a technological answer to Big Data. Hadoop Distributed File System (HDFS) and MapReduce programming model is used for storage and retrieval of ...
Big data classification using heterogeneous ensemble classifiers in Apache Spark based on MapReduce paradigm
Highlights- Distributed Heterogeneous Ensemble is designed for big data classification.
- ...
AbstractIn this era of big data, processing large scale data efficiently and accurately has become a challenging problem. Ensemble classification is a type of supervised learning that uses multiple experts to generate the final output. It ...
Comments