Abstract
The determinants of open data (OD) in prediction using machine learning (ML) have been discussed in this study, which is done by reviewing current research scenario. As open government data (OGD) and social networking services (SNSs) have grown rapidly, OD is considered as the most significant trend for users to enhance their decision-making process. The purpose of the study was to identify the proliferation of OD in ML approaches in generating decisions through a systematic literature review (SLR) and mapping the outcomes in trends. In this systematic mapping study (SMS), the articles published between 2011 and 2020 in major online scientific databases, including IEEE Xplore, Scopus, ACM, Science Direct and Ebscohost were identified and analyzed. A total of 576 articles were found but only 72 articles were included after several selection process according to SLR. The results were presented and mapped based on the designed research questions (RQs). In addition, awareness of the current trend in the OD setting can contribute to the real impact on the computing society by providing the latest development and the need for future research, especially for those dealing with the OD and ML revolution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Open Knowledge Foundation. what is open data? (2014). https://okfn.org/opendata/. Accessed 1 Apr 2019
Open data handbook. What is open data? (2012). https://opendatahandbook.org/en/what-is-open-data/index.html. Accessed 1 Apr 2019
W3C(e-Gov). egovernment at w3c: improving access to government through better use of the web (2009). https://www.w3.org/2007/eGov/. Accessed 1 Apr 2019
Obama, B.: Transparency and open government. Memorandum for the heads of executive departments and agencies (2009)
Foulonneau, M., Martin, S., Turki, S.: How open data are turned into services? In: International Conference on Exploring Services Science, pp. 31–39. Springer, Cham (2014)
Office of Management and Budget’s (OMB). Memorandum m-1 0–06, open government directive (2013). https://goo.gl/LcxbZE. Accessed 1 Apr 2019
Directive 2013/37/EU of the European Parliament and of the Council. Amending directive 2003/98/ec on the re-use of public sector information known as the “psi directive” (2013). https://ec.europa.eu/justice/data-protection/article-29/documentation/opinion-recommendation/files/2013/wp207en.pdf. Accessed 1 Apr 2019
Insights; Publications. What executives should know about open data (2014). https://www.mckinsey.com/industries/technology-media-and-telecommunications/our-insights/what-executives-should-know-about-open-data. Accessed 1 Apr 2019
MAMPU: Our open data policy (2017). https://data.gov.my. Accessed 13 Sept 2019
Lindman, J., Kinnari, T., Rossi, M.: Industrial open data: case studies of early open data entrepreneurs. In: 2014 47th Hawaii International Conference on System Sciences, pp. 739–748. IEEE (2014)
Song, S.H., Kim, T.D.: A study on the open platform modeling for linked open data ecosystem in public sector. In: 2013 15th International Conference on Advanced Communications Technology (ICACT), pp. 730–734. IEEE (2013)
Pantano, E., Priporas, C.V., Stylos, N.: ‘You will like it!’ using open data to predict tourists’ response to a tourist attraction. Tourism Manage. 60, 430–438 (2017)
Chu, S.C., Kim, Y.: Determinants of consumer engagement in electronic word-of-mouth (eWOM) in social networking sites. Int. J. Advert. 30(1), 47–75 (2011)
Diffley, S., Kearns, J., Bennett, W., Kawalek, P.: Consumer behaviour in social networking sites: implications for marketers. Irish J. Manage. (2011)
Jai, T.M.C., Burns, L.D.: Attributes of apparel tablet catalogs: value proposition comparisons. J. Fashion Mark. Manage. (2014)
Turban, E., King, D., Lee, J.K., Liang, T.P., Turban, D.C.: Social commerce: foundations, social marketing, and advertising. In Electronic Commerce, pp. 309–364. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-10091-3_7
Kitchenham, B., Charters, S.: Guidelines for performing systematic literature reviews in software engineering (2007)
Bizer, C., Heath, T., Berners-Lee, T.: Linked data: the story so far. In: Semantic Services, Interoperability and Web Applications: Emerging Concepts, pp. 205–227. IGI Global (2011)
Davis, A., Dieste, O., Hickey, A., Juristo, N., Moreno, A.M.: Effectiveness of requirements elicitation techniques: empirical results derived from a systematic review. In: 14th IEEE International Requirements Engineering Conference (RE 2006), pp. 179–188. IEEE (2006)
Maglyas, A., Nikula, U., Smolander, K.: What do we know about software product management? -A systematic mapping study. In: 2011 Fifth International Workshop on Software Product Management (IWSPM), pp. 26–35. IEEE (2011)
Budgen, D., Burn, A.J., Brereton, O.P., Kitchenham, B.A., Pretorius, R.: Empirical evidence about the UML: a systematic literature review. Softw. Pract. Experience 41(4), 363–392 (2011)
Yin, R.K.: Validity and generalization in future case study evaluations. Evaluation 19(3), 321–332 (2013)
Sadoughi, F., Behmanesh, A., Sayfouri, N.: Internet of things in medicine: a systematic mapping study. J. Biomed. Inform. 103, 103383 (2020)
Halevi, G., Moed, H., Bar-Ilan, J.: Suitability of Google Scholar as a source of scientific information and as a source of data for scientific evaluation—review of the literature. J. Informetrics 11(3), 823–834 (2017)
Madarash-Hill, C., Hill, J.B.: Enhancing access to IEEE conference proceedings: a case study in the application of IEEE Xplore full text and table of contents enhancements. Sci. Technol. Libr. 24(3–4), 389–399 (2004)
Zelevinsky, V., Wang, J., Tunkelang, D.: Supporting exploratory search for the ACM digital library. In: Workshop on Human-Computer Interaction and Information Retrieval (HCIR 2008), pp. 85–88 (2008)
Boyle, F., Sherman, D.: Scopus™: The product and its development. Serials Librarian 49(3), 147–153 (2006)
Lindman, J., Rossi, M., Tuunainen, V.K.: Open data services: Research agenda. In: 2013 46th Hawaii International Conference on System Sciences, pp. 1239–1246. IEEE (2013)
Derguech, W., Bruke, E., Curry, E.: An autonomic approach to real-time predictive analytics using open data and internet of things. In: 2014 IEEE 11th International Conference on Ubiquitous Intelligence and Computing and 2014 IEEE 11th International Conference on Autonomic and Trusted Computing and 2014 IEEE 14th International Conference on Scalable Computing and Communications and Its Associated Workshops, pp. 204–211. IEEE (2014)
Alyahyan, E., Düştegör, D.: Predicting academic success in higher education: literature review and best practices. Int. J. Educ. Technol. High. Educ. 17(1), 3 (2020)
Castañón, J.: (10). Machine learning methods that every data scientist should know. Consultado em Outubro 16 (2019)
Kononenko, I., Kukar, M.: Machine learning basics. Mach. Learn. Data Min. 59–105 (2007)
Zawacki-Richter, O., Marín, V.I., Bond, M., Gouverneur, F.: Systematic review of research on artificial intelligence applications in higher education–where are the educators? Int. J. Educ. Technol. High. Educ. 16(1), 39 (2019)
Schultz, M., Shatter, A.: Directive 2013/37/EU of the European Parliament and of the council of 26 June 2013 amending directive 2003/98/EC on the re-use of public sector information. Official J. Eur. Union Brussels (2013)
Obama, B.: Executive order--making open and machine readable the new default for government information. The White House (2013)
Weerakkody, V., Sivarajah, U., Mahroof, K., Maruyama, T., Lu, S.: Influencing subjective well-being for business and sustainable development using big data and predictive regression analysis. J. Bus. Res. (2020)
Hunnius, S., Krieger, B., Schuppan, T.: Providing, guarding, shielding: open government data in Spain and Germany. In: European Group for Public Administration Annual Conference, Speyer, Germany (2014)
Wright, F.: Data Gov. pp. 77–82 (2014)
Nugroho, R.P., Zuiderwijk, A., Janssen, M., de Jong, M.: A comparison of national open data policies: lessons learned. Transforming Government: People, Process and Policy (2015)
Xue, J.: Financial risk prediction and evaluation model of P2P network loan platform. In: 2020 12th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA), pp. 1060–1064. IEEE (2020)
Alloghani, M., Aljaaf, A.J., Al-Jumeily, D., Hussain, A., Mallucci, C., Mustafina, J.: Data science to improve patient management system. In: 2018 11th International Conference on Developments in eSystems Engineering (DeSE), pp. 27–30. IEEE (2018)
Sarker, F., Tiropanis, T., Davis, H.C.: Linked data, data mining and external open data for better prediction of at-risk students. In: 2014 International Conference on Control, Decision and Information Technologies (CoDIT), pp. 652–657. IEEE (2014)
Capariño, E.T., Sison, A.M., Medina, R.P.: Application of the modified imputation method to missing data to increase classification performance. In: 2019 IEEE 4th International Conference on Computer and Communication Systems (ICCCS), pp. 134–139. IEEE (2019)
Rao, A.R., Clarke, D.: A comparison of models to predict medical procedure costs from open public healthcare data. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2018)
Tuke, J., Nguyen, A., Nasim, M., Mellor, D., Wickramasinghe, A., Bean, N., Mitchell, L.: Pachinko prediction: a Bayesian method for event prediction from social media data. Inf. Process. Manage. 57(2), 102147 (2020)
Zhang, Y., Siriarya, P., Kawai, Y., Jatowt, A.: Automatic latent street type discovery from web open data. Inf. Syst. 101536 (2020)
Tarasova, O., Poroikov, V.: HIV resistance prediction to reverse transcriptase inhibitors: focus on open data. Molecules 23(4), 956 (2018)
Noymanee, J., Nikitin, N.O., Kalyuzhnaya, A.V.: Urban pluvial flood forecasting using open data with machine learning techniques in pattani basin. Procedia Comput. Sci. 119, 288–297 (2017)
Rocca, G.B., Castillo-Cara, M., Levano, R.A., Herrera, J.V., Orozco-Barbosa, L.: Citizen security using machine learning algorithms through open data. In: 2016 8th IEEE Latin-American Conference on Communications (LATINCOM), pp. 1–6. IEEE (2016)
Dias, G.M., Bellalta, B., Oechsner, S.: Predicting occupancy trends in Barcelona’s bicycle service stations using open data. In: 2015 SAI Intelligent Systems Conference (IntelliSys), pp. 439–445. IEEE (2015)
Montanari, F., Zdrazil, B.: How open data shapes in silico transporter modeling. Molecules 22(3), 422 (2017)
Chen, Y.Y., Lv, Y., Li, Z., Wang, F.Y.: Long short-term memory model for traffic congestion prediction with online open data. In: 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), pp. 132–137. IEEE (2016)
Asat, A.N., Mahat, A.F., Hassan, R., Ahmed, A.S.: Development of dengue detection and prevention system (Deng-E) based upon open data in Malaysia. In: 2017 6th International Conference on Electrical Engineering and Informatics (ICEEI), pp. 1–6. IEEE (2017)
Nechaev, Y., Corcoglioniti, F., Giuliano, C.: Type prediction combining linked open data and social media. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 1033–1042 (2018)
Li, R., Xiong, H., Zhao, H.: More than address: pre-identify your income with the open data. In 2015 International Conference on Cloud Computing and Big Data (CCBD), pp. 193–200. IEEE (2015)
Qiao, C., Hu, X.: A joint neural network model for combining heterogeneous user data sources: an example of at-risk student prediction. J. Am. Soc. Inf. Sci. 71(10), 1192–1204 (2020)
Gutierrez-Osorio, C., Pedraza, C.: Modern data sources and techniques for analysis and forecast of road accidents: a review. J. Traffic Transp. Eng. (English edition) (2020)
Panda, M.: Learning crisis management information system from open crisis data using hybrid soft computing. Int. J. Hybrid Intell. Syst. 12(3), 145–156 (2015)
Chen, S., Wang, Q., Liu, S.: Credit risk prediction in peer-to-peer lending with ensemble learning framework. In: 2019 Chinese Control and Decision Conference (CCDC), pp. 4373–4377. IEEE (2019)
Chen, H., Hu, Q., He, L.: Clairvoyant: an early prediction system for video hits. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 2054–2056 (2014)
Pohjankukka, J., Riihimäki, H., Nevalainen, P., Pahikkala, T., Ala-Ilomäki, J., Hyvönen, E., Heikkonen, J.: Predictability of boreal forest soil bearing capacity by machine learning. J. Terramech. 68, 1–8 (2016)
Lubis, F.F., Rosmansyah, Y., Supangkat, S.H.: Gradient descent and normal equations on cost function minimization for online predictive using linear regression with multiple variables. In: 2014 International Conference on ICT for Smart Society (ICISS), pp. 202–205. IEEE (2014)
Lin, B.H., Tseng, S.F.: A predictive analysis of citizen hotlines 1999 and traffic accidents: a case study of Taoyuan city. In: 2017 IEEE International Conference on Big Data and Smart Computing (BigComp), pp. 374–376. IEEE (2017)
Wu, C.H., Kao, S.C., Kan, M.H.: Knowledge discovery in open data of dengue epidemic. In: Proceedings of the 4th Multidisciplinary International Social Networks Conference, pp. 1–8 (2017)
Grzegorowski, M.: Massively parallel feature extraction framework application in predicting dangerous seismic events. In: 2016 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 225–229. IEEE (2016)
Sarker, F., Tiropanis, T., Davis, H.C.: Students’ performance prediction by using institutional internal and external open data sources (2013)
Prabakar, A., Wu, L., Zwanepol, L., Van Velzen, N., Djairam, D.: Applying machine learning to study the relationship between electricity consumption and weather variables using open data. In: 2018 IEEE PES Innovative Smart Grid Technologies Conference Europe (ISGT-Europe), pp. 1–6. IEEE (2018)
Goldstein, E.B., Coco, G., Plant, N.G.: A review of machine learning applications to coastal sediment transport and morphodynamics. Earth Sci. Rev. 194, 97–108 (2019)
Lee, J., Park, G.L.: Temporal data stream analysis for EV charging infrastructure in Jeju. In: Proceedings of the International Conference on Research in Adaptive and Convergent Systems, pp. 36–39 (2017)
Cecconi, F.R., Moretti, N., Tagliabue, L.C.: Application of artificial neutral network and geographic information system to evaluate retrofit potential in public school buildings. Renew. Sustain. Energy Rev. 110, 266–277 (2019)
Petrlik, J., Sekanina, L.: Towards robust and accurate traffic prediction using parallel multiobjective genetic algorithms and support vector regression. In: 2015 IEEE 18th International Conference on Intelligent Transportation Systems, pp. 2231–2236. IEEE (2015)
Shen, S.K., Liu, W., Zhang, T.: Load pattern recognition and prediction based on DTW K-mediods clustering and Markov model. In: 2019 IEEE International Conference on Energy Internet (ICEI), pp. 403–408. IEEE (2019)
Shan, S., Cao, B.: Forecasting the degree of crowding in urban public open space upon multi-source data. In: 2016 9th International Symposium on Computational Intelligence and Design (ISCID), vol. 2, pp. 69–74. IEEE (2016)
Violos, J., Pelekis, S., Berdelis, A., Tsanakas, S., Tserpes, K., Varvarigou, T.: Predicting visitor distribution for large events in smart cities. In: 2019 IEEE International Conference on Big Data and Smart Computing (BigComp), pp. 1–8. IEEE (2019)
Goel, M., Sharma, N., Gurve, M.K.: Analysis of global terrorism dataset using open source data mining tools. In: 2019 International Conference on Computing, Power and Communication Technologies (GUCON), pp. 165–170. IEEE (2019)
Pradhan, I., Potika, K., Eirinaki, M., Potikas, P.: Exploratory data analysis and crime prediction for smart cities. In: Proceedings of the 23rd International Database Applications and Engineering Symposium, pp. 1–9 (2019)
Acknowledgement
The authors wish to thank the Universiti Sains Malaysia (USM) and Universiti Malaysia Perlis (UniMAP) for the support it has extended in the completion of the present research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ismail, N., Yusof, U.K. (2021). Open Data in Prediction Using Machine Learning: A Systematic Review. In: Saeed, F., Mohammed, F., Al-Nahari, A. (eds) Innovative Systems for Intelligent Health Informatics. IRICT 2020. Lecture Notes on Data Engineering and Communications Technologies, vol 72. Springer, Cham. https://doi.org/10.1007/978-3-030-70713-2_50
Download citation
DOI: https://doi.org/10.1007/978-3-030-70713-2_50
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-70712-5
Online ISBN: 978-3-030-70713-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)