Skip to main content

Open Data in Prediction Using Machine Learning: A Systematic Review

  • Conference paper
  • First Online:
Innovative Systems for Intelligent Health Informatics (IRICT 2020)

Abstract

The determinants of open data (OD) in prediction using machine learning (ML) have been discussed in this study, which is done by reviewing current research scenario. As open government data (OGD) and social networking services (SNSs) have grown rapidly, OD is considered as the most significant trend for users to enhance their decision-making process. The purpose of the study was to identify the proliferation of OD in ML approaches in generating decisions through a systematic literature review (SLR) and mapping the outcomes in trends. In this systematic mapping study (SMS), the articles published between 2011 and 2020 in major online scientific databases, including IEEE Xplore, Scopus, ACM, Science Direct and Ebscohost were identified and analyzed. A total of 576 articles were found but only 72 articles were included after several selection process according to SLR. The results were presented and mapped based on the designed research questions (RQs). In addition, awareness of the current trend in the OD setting can contribute to the real impact on the computing society by providing the latest development and the need for future research, especially for those dealing with the OD and ML revolution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Open Knowledge Foundation. what is open data? (2014). https://okfn.org/opendata/. Accessed 1 Apr 2019

  2. Open data handbook. What is open data? (2012). https://opendatahandbook.org/en/what-is-open-data/index.html. Accessed 1 Apr 2019

  3. W3C(e-Gov). egovernment at w3c: improving access to government through better use of the web (2009). https://www.w3.org/2007/eGov/. Accessed 1 Apr 2019

  4. Obama, B.: Transparency and open government. Memorandum for the heads of executive departments and agencies (2009)

    Google Scholar 

  5. Foulonneau, M., Martin, S., Turki, S.: How open data are turned into services? In: International Conference on Exploring Services Science, pp. 31–39. Springer, Cham (2014)

    Google Scholar 

  6. Office of Management and Budget’s (OMB). Memorandum m-1 0–06, open government directive (2013). https://goo.gl/LcxbZE. Accessed 1 Apr 2019

  7. Directive 2013/37/EU of the European Parliament and of the Council. Amending directive 2003/98/ec on the re-use of public sector information known as the “psi directive” (2013). https://ec.europa.eu/justice/data-protection/article-29/documentation/opinion-recommendation/files/2013/wp207en.pdf. Accessed 1 Apr 2019

  8. Insights; Publications. What executives should know about open data (2014). https://www.mckinsey.com/industries/technology-media-and-telecommunications/our-insights/what-executives-should-know-about-open-data. Accessed 1 Apr 2019

  9. MAMPU: Our open data policy (2017). https://data.gov.my. Accessed 13 Sept 2019

  10. Lindman, J., Kinnari, T., Rossi, M.: Industrial open data: case studies of early open data entrepreneurs. In: 2014 47th Hawaii International Conference on System Sciences, pp. 739–748. IEEE (2014)

    Google Scholar 

  11. Song, S.H., Kim, T.D.: A study on the open platform modeling for linked open data ecosystem in public sector. In: 2013 15th International Conference on Advanced Communications Technology (ICACT), pp. 730–734. IEEE (2013)

    Google Scholar 

  12. Pantano, E., Priporas, C.V., Stylos, N.: ‘You will like it!’ using open data to predict tourists’ response to a tourist attraction. Tourism Manage. 60, 430–438 (2017)

    Google Scholar 

  13. Chu, S.C., Kim, Y.: Determinants of consumer engagement in electronic word-of-mouth (eWOM) in social networking sites. Int. J. Advert. 30(1), 47–75 (2011)

    Article  Google Scholar 

  14. Diffley, S., Kearns, J., Bennett, W., Kawalek, P.: Consumer behaviour in social networking sites: implications for marketers. Irish J. Manage. (2011)

    Google Scholar 

  15. Jai, T.M.C., Burns, L.D.: Attributes of apparel tablet catalogs: value proposition comparisons. J. Fashion Mark. Manage. (2014)

    Google Scholar 

  16. Turban, E., King, D., Lee, J.K., Liang, T.P., Turban, D.C.: Social commerce: foundations, social marketing, and advertising. In Electronic Commerce, pp. 309–364. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-10091-3_7

  17. Kitchenham, B., Charters, S.: Guidelines for performing systematic literature reviews in software engineering (2007)

    Google Scholar 

  18. Bizer, C., Heath, T., Berners-Lee, T.: Linked data: the story so far. In: Semantic Services, Interoperability and Web Applications: Emerging Concepts, pp. 205–227. IGI Global (2011)

    Google Scholar 

  19. Davis, A., Dieste, O., Hickey, A., Juristo, N., Moreno, A.M.: Effectiveness of requirements elicitation techniques: empirical results derived from a systematic review. In: 14th IEEE International Requirements Engineering Conference (RE 2006), pp. 179–188. IEEE (2006)

    Google Scholar 

  20. Maglyas, A., Nikula, U., Smolander, K.: What do we know about software product management? -A systematic mapping study. In: 2011 Fifth International Workshop on Software Product Management (IWSPM), pp. 26–35. IEEE (2011)

    Google Scholar 

  21. Budgen, D., Burn, A.J., Brereton, O.P., Kitchenham, B.A., Pretorius, R.: Empirical evidence about the UML: a systematic literature review. Softw. Pract. Experience 41(4), 363–392 (2011)

    Google Scholar 

  22. Yin, R.K.: Validity and generalization in future case study evaluations. Evaluation 19(3), 321–332 (2013)

    Article  Google Scholar 

  23. Sadoughi, F., Behmanesh, A., Sayfouri, N.: Internet of things in medicine: a systematic mapping study. J. Biomed. Inform. 103, 103383 (2020)

    Article  Google Scholar 

  24. Halevi, G., Moed, H., Bar-Ilan, J.: Suitability of Google Scholar as a source of scientific information and as a source of data for scientific evaluation—review of the literature. J. Informetrics 11(3), 823–834 (2017)

    Article  Google Scholar 

  25. Madarash-Hill, C., Hill, J.B.: Enhancing access to IEEE conference proceedings: a case study in the application of IEEE Xplore full text and table of contents enhancements. Sci. Technol. Libr. 24(3–4), 389–399 (2004)

    Article  Google Scholar 

  26. Zelevinsky, V., Wang, J., Tunkelang, D.: Supporting exploratory search for the ACM digital library. In: Workshop on Human-Computer Interaction and Information Retrieval (HCIR 2008), pp. 85–88 (2008)

    Google Scholar 

  27. Boyle, F., Sherman, D.: Scopus™: The product and its development. Serials Librarian 49(3), 147–153 (2006)

    Google Scholar 

  28. Lindman, J., Rossi, M., Tuunainen, V.K.: Open data services: Research agenda. In: 2013 46th Hawaii International Conference on System Sciences, pp. 1239–1246. IEEE (2013)

    Google Scholar 

  29. Derguech, W., Bruke, E., Curry, E.: An autonomic approach to real-time predictive analytics using open data and internet of things. In: 2014 IEEE 11th International Conference on Ubiquitous Intelligence and Computing and 2014 IEEE 11th International Conference on Autonomic and Trusted Computing and 2014 IEEE 14th International Conference on Scalable Computing and Communications and Its Associated Workshops, pp. 204–211. IEEE (2014)

    Google Scholar 

  30. Alyahyan, E., Düştegör, D.: Predicting academic success in higher education: literature review and best practices. Int. J. Educ. Technol. High. Educ. 17(1), 3 (2020)

    Article  Google Scholar 

  31. Castañón, J.: (10). Machine learning methods that every data scientist should know. Consultado em Outubro 16 (2019)

    Google Scholar 

  32. Kononenko, I., Kukar, M.: Machine learning basics. Mach. Learn. Data Min. 59–105 (2007)

    Google Scholar 

  33. Zawacki-Richter, O., Marín, V.I., Bond, M., Gouverneur, F.: Systematic review of research on artificial intelligence applications in higher education–where are the educators? Int. J. Educ. Technol. High. Educ. 16(1), 39 (2019)

    Article  Google Scholar 

  34. Schultz, M., Shatter, A.: Directive 2013/37/EU of the European Parliament and of the council of 26 June 2013 amending directive 2003/98/EC on the re-use of public sector information. Official J. Eur. Union Brussels (2013)

    Google Scholar 

  35. Obama, B.: Executive order--making open and machine readable the new default for government information. The White House (2013)

    Google Scholar 

  36. Weerakkody, V., Sivarajah, U., Mahroof, K., Maruyama, T., Lu, S.: Influencing subjective well-being for business and sustainable development using big data and predictive regression analysis. J. Bus. Res. (2020)

    Google Scholar 

  37. Hunnius, S., Krieger, B., Schuppan, T.: Providing, guarding, shielding: open government data in Spain and Germany. In: European Group for Public Administration Annual Conference, Speyer, Germany (2014)

    Google Scholar 

  38. Wright, F.: Data Gov. pp. 77–82 (2014)

    Google Scholar 

  39. Nugroho, R.P., Zuiderwijk, A., Janssen, M., de Jong, M.: A comparison of national open data policies: lessons learned. Transforming Government: People, Process and Policy (2015)

    Google Scholar 

  40. Xue, J.: Financial risk prediction and evaluation model of P2P network loan platform. In: 2020 12th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA), pp. 1060–1064. IEEE (2020)

    Google Scholar 

  41. Alloghani, M., Aljaaf, A.J., Al-Jumeily, D., Hussain, A., Mallucci, C., Mustafina, J.: Data science to improve patient management system. In: 2018 11th International Conference on Developments in eSystems Engineering (DeSE), pp. 27–30. IEEE (2018)

    Google Scholar 

  42. Sarker, F., Tiropanis, T., Davis, H.C.: Linked data, data mining and external open data for better prediction of at-risk students. In: 2014 International Conference on Control, Decision and Information Technologies (CoDIT), pp. 652–657. IEEE (2014)

    Google Scholar 

  43. Capariño, E.T., Sison, A.M., Medina, R.P.: Application of the modified imputation method to missing data to increase classification performance. In: 2019 IEEE 4th International Conference on Computer and Communication Systems (ICCCS), pp. 134–139. IEEE (2019)

    Google Scholar 

  44. Rao, A.R., Clarke, D.: A comparison of models to predict medical procedure costs from open public healthcare data. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2018)

    Google Scholar 

  45. Tuke, J., Nguyen, A., Nasim, M., Mellor, D., Wickramasinghe, A., Bean, N., Mitchell, L.: Pachinko prediction: a Bayesian method for event prediction from social media data. Inf. Process. Manage. 57(2), 102147 (2020)

    Article  Google Scholar 

  46. Zhang, Y., Siriarya, P., Kawai, Y., Jatowt, A.: Automatic latent street type discovery from web open data. Inf. Syst. 101536 (2020)

    Google Scholar 

  47. Tarasova, O., Poroikov, V.: HIV resistance prediction to reverse transcriptase inhibitors: focus on open data. Molecules 23(4), 956 (2018)

    Article  Google Scholar 

  48. Noymanee, J., Nikitin, N.O., Kalyuzhnaya, A.V.: Urban pluvial flood forecasting using open data with machine learning techniques in pattani basin. Procedia Comput. Sci. 119, 288–297 (2017)

    Article  Google Scholar 

  49. Rocca, G.B., Castillo-Cara, M., Levano, R.A., Herrera, J.V., Orozco-Barbosa, L.: Citizen security using machine learning algorithms through open data. In: 2016 8th IEEE Latin-American Conference on Communications (LATINCOM), pp. 1–6. IEEE (2016)

    Google Scholar 

  50. Dias, G.M., Bellalta, B., Oechsner, S.: Predicting occupancy trends in Barcelona’s bicycle service stations using open data. In: 2015 SAI Intelligent Systems Conference (IntelliSys), pp. 439–445. IEEE (2015)

    Google Scholar 

  51. Montanari, F., Zdrazil, B.: How open data shapes in silico transporter modeling. Molecules 22(3), 422 (2017)

    Article  Google Scholar 

  52. Chen, Y.Y., Lv, Y., Li, Z., Wang, F.Y.: Long short-term memory model for traffic congestion prediction with online open data. In: 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), pp. 132–137. IEEE (2016)

    Google Scholar 

  53. Asat, A.N., Mahat, A.F., Hassan, R., Ahmed, A.S.: Development of dengue detection and prevention system (Deng-E) based upon open data in Malaysia. In: 2017 6th International Conference on Electrical Engineering and Informatics (ICEEI), pp. 1–6. IEEE (2017)

    Google Scholar 

  54. Nechaev, Y., Corcoglioniti, F., Giuliano, C.: Type prediction combining linked open data and social media. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 1033–1042 (2018)

    Google Scholar 

  55. Li, R., Xiong, H., Zhao, H.: More than address: pre-identify your income with the open data. In 2015 International Conference on Cloud Computing and Big Data (CCBD), pp. 193–200. IEEE (2015)

    Google Scholar 

  56. Qiao, C., Hu, X.: A joint neural network model for combining heterogeneous user data sources: an example of at-risk student prediction. J. Am. Soc. Inf. Sci. 71(10), 1192–1204 (2020)

    Google Scholar 

  57. Gutierrez-Osorio, C., Pedraza, C.: Modern data sources and techniques for analysis and forecast of road accidents: a review. J. Traffic Transp. Eng. (English edition) (2020)

    Google Scholar 

  58. Panda, M.: Learning crisis management information system from open crisis data using hybrid soft computing. Int. J. Hybrid Intell. Syst. 12(3), 145–156 (2015)

    Article  Google Scholar 

  59. Chen, S., Wang, Q., Liu, S.: Credit risk prediction in peer-to-peer lending with ensemble learning framework. In: 2019 Chinese Control and Decision Conference (CCDC), pp. 4373–4377. IEEE (2019)

    Google Scholar 

  60. Chen, H., Hu, Q., He, L.: Clairvoyant: an early prediction system for video hits. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 2054–2056 (2014)

    Google Scholar 

  61. Pohjankukka, J., Riihimäki, H., Nevalainen, P., Pahikkala, T., Ala-Ilomäki, J., Hyvönen, E., Heikkonen, J.: Predictability of boreal forest soil bearing capacity by machine learning. J. Terramech. 68, 1–8 (2016)

    Google Scholar 

  62. Lubis, F.F., Rosmansyah, Y., Supangkat, S.H.: Gradient descent and normal equations on cost function minimization for online predictive using linear regression with multiple variables. In: 2014 International Conference on ICT for Smart Society (ICISS), pp. 202–205. IEEE (2014)

    Google Scholar 

  63. Lin, B.H., Tseng, S.F.: A predictive analysis of citizen hotlines 1999 and traffic accidents: a case study of Taoyuan city. In: 2017 IEEE International Conference on Big Data and Smart Computing (BigComp), pp. 374–376. IEEE (2017)

    Google Scholar 

  64. Wu, C.H., Kao, S.C., Kan, M.H.: Knowledge discovery in open data of dengue epidemic. In: Proceedings of the 4th Multidisciplinary International Social Networks Conference, pp. 1–8 (2017)

    Google Scholar 

  65. Grzegorowski, M.: Massively parallel feature extraction framework application in predicting dangerous seismic events. In: 2016 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 225–229. IEEE (2016)

    Google Scholar 

  66. Sarker, F., Tiropanis, T., Davis, H.C.: Students’ performance prediction by using institutional internal and external open data sources (2013)

    Google Scholar 

  67. Prabakar, A., Wu, L., Zwanepol, L., Van Velzen, N., Djairam, D.: Applying machine learning to study the relationship between electricity consumption and weather variables using open data. In: 2018 IEEE PES Innovative Smart Grid Technologies Conference Europe (ISGT-Europe), pp. 1–6. IEEE (2018)

    Google Scholar 

  68. Goldstein, E.B., Coco, G., Plant, N.G.: A review of machine learning applications to coastal sediment transport and morphodynamics. Earth Sci. Rev. 194, 97–108 (2019)

    Article  Google Scholar 

  69. Lee, J., Park, G.L.: Temporal data stream analysis for EV charging infrastructure in Jeju. In: Proceedings of the International Conference on Research in Adaptive and Convergent Systems, pp. 36–39 (2017)

    Google Scholar 

  70. Cecconi, F.R., Moretti, N., Tagliabue, L.C.: Application of artificial neutral network and geographic information system to evaluate retrofit potential in public school buildings. Renew. Sustain. Energy Rev. 110, 266–277 (2019)

    Article  Google Scholar 

  71. Petrlik, J., Sekanina, L.: Towards robust and accurate traffic prediction using parallel multiobjective genetic algorithms and support vector regression. In: 2015 IEEE 18th International Conference on Intelligent Transportation Systems, pp. 2231–2236. IEEE (2015)

    Google Scholar 

  72. Shen, S.K., Liu, W., Zhang, T.: Load pattern recognition and prediction based on DTW K-mediods clustering and Markov model. In: 2019 IEEE International Conference on Energy Internet (ICEI), pp. 403–408. IEEE (2019)

    Google Scholar 

  73. Shan, S., Cao, B.: Forecasting the degree of crowding in urban public open space upon multi-source data. In: 2016 9th International Symposium on Computational Intelligence and Design (ISCID), vol. 2, pp. 69–74. IEEE (2016)

    Google Scholar 

  74. Violos, J., Pelekis, S., Berdelis, A., Tsanakas, S., Tserpes, K., Varvarigou, T.: Predicting visitor distribution for large events in smart cities. In: 2019 IEEE International Conference on Big Data and Smart Computing (BigComp), pp. 1–8. IEEE (2019)

    Google Scholar 

  75. Goel, M., Sharma, N., Gurve, M.K.: Analysis of global terrorism dataset using open source data mining tools. In: 2019 International Conference on Computing, Power and Communication Technologies (GUCON), pp. 165–170. IEEE (2019)

    Google Scholar 

  76. Pradhan, I., Potika, K., Eirinaki, M., Potikas, P.: Exploratory data analysis and crime prediction for smart cities. In: Proceedings of the 23rd International Database Applications and Engineering Symposium, pp. 1–9 (2019)

    Google Scholar 

Download references

Acknowledgement

The authors wish to thank the Universiti Sains Malaysia (USM) and Universiti Malaysia Perlis (UniMAP) for the support it has extended in the completion of the present research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Norismiza Ismail .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ismail, N., Yusof, U.K. (2021). Open Data in Prediction Using Machine Learning: A Systematic Review. In: Saeed, F., Mohammed, F., Al-Nahari, A. (eds) Innovative Systems for Intelligent Health Informatics. IRICT 2020. Lecture Notes on Data Engineering and Communications Technologies, vol 72. Springer, Cham. https://doi.org/10.1007/978-3-030-70713-2_50

Download citation

Publish with us

Policies and ethics