ABSTRACT
Distributed networks and real-time systems are becoming the most important components for the new computer age – the Internet of Things (IoT), with huge data streams generated from sensors and data sets generated from existing legacy systems. The data generated offers the ability to measure, infer and understand environmental indicators, from delicate ecologies and natural resources to urban environments. This can be achieved through the analysis of the heterogeneous data sources (structured and unstructured). In this paper, we propose a distributed framework – Event STream Processing Engine for Environmental Monitoring Domain (ESTemd) for the application of stream processing on heterogeneous environmental data. Our work in this area demonstrates the useful role big data techniques can play in an environmental decision support system, early warning and forecasting systems. The proposed framework addresses the challenges of data heterogeneity from heterogeneous systems and offers real-time processing of huge environmental datasets through a publish/subscribe method via a unified data pipeline with the application of Apache Kafka for real-time analytics.
- Apache Flink. https://flink.apache.org. Accessed on 6 Oct 2019.Google Scholar
- Apache Storm. https://storm.apache.org. Accessed on 12 Jan 2020.Google Scholar
- Hadoop. https://hadoop.apache.org/. Accessed on 13 Jan 2020.Google Scholar
- Dean, J. and Ghemawat, S., 2008. MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1), pp.107-113.Google ScholarDigital Library
- Apache Storm. https://storm.apache.org. Accessed on 14 Jan 2020.Google Scholar
- Microsoft StreamInsight. https://docs.microsoft.com/en/us/previous/versions/sql/streaminsight/ee362541(v=sql.111)?redirectedfrom=MSDN. Accessed on 19 Jan 2020.Google Scholar
- Apache Spark. https://spark.apache.org. Accessed on 10 Nov 2019.Google Scholar
- Siddhi. https://siddhi.io/en/v4.x/docs/. Accessed on 18 Jan 2020.Google Scholar
- SAP ESP. https://www.sap.com/africa/products/complex-event-processing.html. Acessed on 2 Jan 2020.Google Scholar
- Dean, J. and Ghemawat, S., 2008. MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1), pp.107-113.Google ScholarDigital Library
- Zikopoulos, P. and Eaton, C., 2011. Understanding big data: Analytics for enterprise class hadoop and streaming data. McGraw-Hill Osborne Media.Google Scholar
- ESPER. http://www.espertech.com/esper. Accessed on 20 Dec 2019.Google Scholar
- Shukla, A. and Simmhan, Y., 2016, September. Benchmarking distributed stream processing platforms for iot applications. In Technology Conference on Performance Evaluation and Benchmarking (pp. 90-106). Springer, Cham.Google Scholar
- Malek, Y.N., Kharbouch, A., El Khoukhi, H., Bakhouya, M., De Florio, V., El Ouadghiri, D., Latré, S. and Blondia, C., 2017. On the use of IoT and big data technologies for real-time monitoring and data processing. Procedia computer science, 113, pp.429-434.Google Scholar
- Rios, L.G., 2014, June. Big data infrastructure for analysing data generated by wireless sensor networks. In 2014 IEEE International Congress on Big Data (pp. 816-823). IEEE.Google ScholarDigital Library
- Gedik, B., Andrade, H., Wu, K.L., Yu, P.S. and Doo, M., 2008, June. SPADE: the system s declarative stream processing engine. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data (pp. 1123-1134). ACM.Google Scholar
- Shahrivari, S., 2014. Beyond batch processing: towards real-time and streaming big data. Computers, 3(4), pp.117-129.Google ScholarCross Ref
- Dean, J. and Ghemawat, S., 2008. MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1), pp.107-113.Google ScholarDigital Library
- Marcu, O.C., Costan, A., Antoniu, G., Pérez-Hernández, M., Tudoran, R., Bortoli, S. and Nicolae, B., 2018. Storage and Ingestion Systems in Support of Stream Processing: A Survey.Google Scholar
- Demers, A.J., Gehrke, J., Panda, B., Riedewald, M., Sharma, V. and White, W.M., 2007, January. Cayuga: A General Purpose Event Monitoring System. In Cidr (Vol. 7, pp. 412-422).Google Scholar
- Zhou, Q., Simmhan, Y. and Prasanna, V., 2017. Knowledge-infused and consistent Complex Event Processing over real-time and persistent streams. Future Generation Computer Systems, 76, pp.391-406.Google ScholarDigital Library
- Akanbi, A. and Masinde, M., 2020. A Distributed Stream Processing Middleware Framework for Real-Time Analysis of Heterogeneous Data on Big Data Platform: Case of Environmental Monitoring. Sensors, 20(11), p.3166.Google ScholarCross Ref
- Apache Kafka. https://kafka.apache.org. Accessed on 6 Oct 2019.Google Scholar
- Jafarpour, H. and Desai, R., 2019. KSQL: Streaming SQL Engine for Apache Kafka. In EDBT (pp. 524-533).Google Scholar
- Confluent. https://www.confluent.io/. Accessed on 2 Jan 2020.Google Scholar
- Akanbi, A.K. and Masinde, M., 2015, December. Towards semantic integration of heterogeneous sensor data with indigenous knowledge for drought forecasting. In Proceedings of the Doctoral Symposium of the 16th International Middleware Conference (pp. 1-5).Google Scholar
- Shree, R., Choudhury, T., Gupta, S.C. and Kumar, P., 2017, August. KAFKA: The modern platform for data management and analysis in big data domain. In 2017 2nd International Conference on Telecommunication and Networks (TEL-NET) (pp. 1-5). IEEEGoogle Scholar
- Akanbi, A.K. and Masinde, M., 2018, May. Semantic interoperability middleware architecture for heterogeneous environmental data sources. In 2018 IST-Africa Week Conference (IST-Africa) (pp. Page-1). IEEE.Google Scholar
- Thein, K.M.M., 2014. Apache kafka: Next generation distributed messaging system. International Journal of Scientific Engineering and Technology Research, 3(47), pp.9478-9483.Google Scholar
- Chawda, R.K. and Thakur, G., 2016, March. Big data and advanced analytics tools. In 2016 symposium on colossal data analysis and networking (CDAN) (pp. 1-8). IEEE.Google Scholar
- Jain, A., 2016. The 5 Vs of big data. IBM Watson Health Perspectives. Dostupno na: https://www. ibm. com/blogs/watson-health/the-5-vs-of-big-data/.[30.05. 2017].Google Scholar
- Yin, S. and Kaynak, O., 2015. Big data for modern industry: challenges and trends [point of view]. Proceedings of the IEEE, 103(2), pp.143-146.Google ScholarCross Ref
- Kafka, A., 2014. A high-throughput distributed messaging system. URL: kafka. apache. org as of, 5(1).Google Scholar
- Lachev, T. and Price, E., 2018. Applied Microsoft Power BI: Bring your data to life!. Prologika Press.Google Scholar
- Akka. http://akka.io/. Accessed on 15 Jan 2020.Google Scholar
- Kejariwal, A., Kulkarni, S. and Ramasamy, K., 2017. Real time analytics: algorithms and systems. arXiv preprint arXiv:1708.02621.Google Scholar
- Nayak, S. and Kumar, T.S., 2008. Indian tsunami warning system. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Beijing, 37(1), pp.1501-1506.Google Scholar
- Lara, R., Benitez, D., Caamano, A., Zennaro, M. and Rojo-Alvarez, J.L., 2015. On real-time performance evaluation of volcano-monitoring systems with wireless sensor networks. IEEE Sensors Journal, 15(6), pp.3514-3523.Google ScholarCross Ref
- Bose, S., Mukherjee, N. and Mistry, S., 2016, August. Environment monitoring in smart cities using virtual sensors. In 2016 IEEE 4th International Conference on Future Internet of Things and Cloud (FiCloud) (pp. 399-404). IEEE.Google Scholar
- https://www.timeslive.co.za/news/south-africa/2019-12-10-dramatic-scenes-of-chaos-in-parts-of-gauteng-after-flooding/Google Scholar
- https://en.wikipedia.org/wiki/2019_Arkansas_River_floodsGoogle Scholar
- Anuradha, J., 2015. A brief introduction on Big Data 5Vs characteristics and Hadoop technology. Procedia computer science, 48, pp.319-324.Google Scholar
Comments