Skip to main content

Healthcare Support Using Data Mining: A Case Study on Stroke Prediction

  • Chapter
  • First Online:
Artificial Intelligence and Machine Learning for Healthcare

Abstract

Data and information have become a valuable asset for small and big organizations in the past few decades. Data is the main ingredient for strategic decision-making, which could give businesses a significant advantage over their competitors, by providing customized services or overall experience to their customers and attracting new ones. For this purpose, data mining techniques are being utilized so that valuable information can be discovered and exploited. There is a vast amount of data generated in the field of healthcare that is not getting fully exploited by traditional methods, for reasons, such as their complexity, velocity, and volume. Therefore, there is a demand for the development of powerful automated data mining tools for the complete utilization of these data, and the uncovering of patterns and precious knowledge about patients, medical claims, treatment costs, hospitals, etc. This work focuses on exploiting the best-known data mining techniques: classification, clustering, and association rule mining, which are utilized extensively in the healthcare industry for incident prediction and general medical knowledge acquisition. The data mining process comprises several steps, such as data selection, pre-processing, transformation, interpretation, and evaluation. The section of the experimentation includes a stroke incidents dataset fetched from the Kaggle dataset provider. This chapter also provides a literature survey on data mining applications in the healthcare sector, while discussing the abovementioned machine learning concepts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Abbreviations

Term:

Definition

Bmi:

Body Mass Index

CRISP-DM:

Cross-Industry Standard Process for Data Mining

CRM:

Customer Relationship Management

Data mining algorithm:

A set of heuristics and calculations that creates a model from data

DBSCAN:

Density-Based Spatial Clustering of Applications with Noise

Feature:

An independent variable that can be used as input of a machine learning model

FP-Growth:

Frequent Pattern Growth

GDPR:

General Data Protection Regulation

LEADERS:

Lightweight Epidemiological Advanced Detection Emergency Response System

Machine learning model:

An expression of an algorithm that processes data to explore patterns or make predictions

OLAP:

Online Analytical Processing

Operator:

In RapidMiner, a group of functions that perform actions on input data through parameters and outputs the results of said actions

Parameter:

A special kind of variable used in an algorithm to refer to one of the pieces of data provided as input to it

Process:

In RapidMiner, it is a set of sequentially connected operators represented by a flow design, where each operator provides its output as input to the next one

ROC:

Receiver Operating Characteristic

SEMMA:

Sample, explore, modify, model, assess

SVM:

Support Vector Machine

References

  1. Koukaras, P., Tjortjis, C., Rousidis, D.: Social Media Types: introducing a data driven taxonomy. Computing 102(1), 295–340 (2020). https://doi.org/10.1007/s00607-019-00739-y

    Article  Google Scholar 

  2. Baitharu, T.R., Pani, S.K.: Analysis of Data Mining Techniques for Healthcare Decision Support System Using Liver Disorder Dataset. Procedia Computer Science 85, 862–870 (2016). https://doi.org/10.1016/j.procs.2016.05.276

    Article  Google Scholar 

  3. Tjortjis, C., Saraee, M., Theodoulidis, B., Keane, J.A.: Using T3, an Improved Decision Tree Classifier, for Mining Stroke-related Medical Data. Methods Inf. Med. 46(05), 523–529 (2007). https://doi.org/10.1160/ME0317

    Article  Google Scholar 

  4. Koh HC, Tan G. “Data mining applications in healthcare”, J Healthc Inf Manag, 2005 Spring;19(2):64–72. PMID: 15869215.

    Google Scholar 

  5. M. H. Tekieh and B. Raahemi, “Importance of data mining in healthcare: A survey,” in Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2015, Aug. 2015, pp. 1057–1062. doi: https://doi.org/10.1145/2808797.2809367.

  6. Zhang, S., Tjortjis, C., Zeng, X., Qiao, H., Buchan, I., Keane, J.: Comparing Data Mining Methods with Logistic Regression in Childhood Obesity Prediction. Inf. Syst. Front. 11(4), 449–460 (2009). https://doi.org/10.1007/s10796-009-9157-0

    Article  Google Scholar 

  7. Glover, S., Rivers, P.A., Asoh, D.A., Piper, C.N., Murph, K.: Data mining for health executive decision support: An imperative with a daunting future! Health Serv. Manage. Res. 23(1), 42–46 (2010). https://doi.org/10.1258/hsmr.2009.009029

    Article  Google Scholar 

  8. Tomar, D., Agarwal, S.: A survey on data mining approaches for healthcare. International Journal of Bio-Science and Bio-Technology 5(5), 241–266 (2013). https://doi.org/10.14257/ijbsbt.2013.5.5.25

    Article  Google Scholar 

  9. Obenshain, M.K.: Application of Data Mining Techniques to Healthcare Data. Infect. Control Hosp. Epidemiol. 25(8), 690–695 (2004). https://doi.org/10.1086/502460

    Article  Google Scholar 

  10. T. Chatzinikolaou, E. Vogiazti, A. Kousis, and C. Tjortjis, “Smart Healthcare Support Using Data Mining and Machine Learning,” in EAI/Springer Innovations in Communication and Computing Book: “IoT and WSN based SmartCities: A Machine Learning Perspective,” 2022.

    Google Scholar 

  11. P. Koukaras, D. Rousidis and C. Tjortjis, “Forecasting and Prevention Mechanisms Using Social Media in Health Care”, in Maglogiannis I., Brahnam S., Jain L. (eds) Advanced Computational Intelligence in Healthcare-7. Studies in Computational Intelligence, vol 891, March 2020, Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-61114-2_8.

  12. S. El-Sappagh, S. El-Masri, M. Elmogy, S. H. El-Sappagh, and A. M. Riad, “Data Mining and Knowledge Discovery: Applications, Techniques, Challenges and Process Models in Healthcare,” International Journal of Engineering Research and Applications (IJERA), vol. 3, no. 3, pp. 900–906, May 2013, [Online]. Available: https://www.researchgate.net/publication/250612388.

  13. P. Koukaras, C. Berberidis and C. Tjortjis, “A Semi-supervised Learning Approach for Complex Information Networks”, in Hemanth J., Bestak R., Chen J.IZ. (eds) Intelligent Data Communication Technologies and Internet of Things. Lecture Notes on Data Engineering and Communications Technologies, vol 57, February 2021, Springer, Singapore. https://doi.org/10.1007/978-981-15-9509-7_1.

  14. Ahmad, P., Qamar, S., Qasim, S., Rizvi, A.: Techniques of Data Mining In Healthcare: A Review. International Journal of Computer Applications 120(15), 38–50 (2015). https://doi.org/10.5120/21307-4126

    Article  Google Scholar 

  15. Tzirakis, P., Tjortjis, C.: T3C: improving a decision tree classification algorithm’s interval splits on continuous attributes. Adv. Data Anal. Classif. 11(2), 353–370 (2017). https://doi.org/10.1007/s11634-016-0246-x

    Article  MathSciNet  MATH  Google Scholar 

  16. Das, R., Turkoglu, I., Sengur, A.: Effective diagnosis of heart disease through neural networks ensembles. Expert Syst. Appl. 36(4), 7675–7680 (2009). https://doi.org/10.1016/j.eswa.2008.09.013

    Article  Google Scholar 

  17. D. I. Curiac, G. Vasile, O. Banias, C. Volosencu, and A. Albu, “Bayesian network model for diagnosis of psychiatric diseases,” in Proceedings of the International Conference on Information Technology Interfaces, ITI, 2009, pp. 61–66. doi: https://doi.org/10.1109/ITI.2009.5196055.

  18. Divya, D., Agarwal, S.: Weighted support vector regression approach for remote healthcare monitoring. International Conference on Recent Trends in Information Technology, ICRTIT 2011, 969–974 (2011). https://doi.org/10.1109/ICRTIT.2011.5972437

    Article  Google Scholar 

  19. J. Alapont, A. Bella-Sanjuán, C. Ferri, J. Hernández-Orallo, J. D. Llopis-Llopis, and M. J. Ramírez-Quintana, “Specialised Tools for Automating Data Mining for Hospital Management,” in Proc. First East European Conference on Health Care Modelling and Computation, Aug. 2005, pp. 7–19.

    Google Scholar 

  20. Kanellopoulos, Y., Antonellis, P., Tjortjis, C., Makris, C., Tsirakis, N.: k-Attractors: A Partitional Clustering Algorithm for Numeric Data Analysis. Appl. Artif. Intell. 25(2), 97–115 (2011). https://doi.org/10.1080/08839514.2011.534590

    Article  Google Scholar 

  21. Bertsimas, D., et al.: Algorithmic prediction of health-care costs. Oper. Res. 56(6), 1382–1392 (2008). https://doi.org/10.1287/opre.1080.0619

    Article  MATH  Google Scholar 

  22. Y. Peng, G. Kou, A. Sabatka, Z. Chen, D. Khazanchi, and Y. Shi, “Application of Clustering Methods to Health Insurance Fraud Detection,” in 2006 International Conference on Service Systems and Service Management, Oct. 2006, pp. 116–120. doi: https://doi.org/10.1109/ICSSSM.2006.320598.

  23. S. M. Ghafari and C. Tjortjis, “A survey on association rules mining using heuristics,” WIREs Data Mining and Knowledge Discovery, vol. 9, no. 4, Jul. 2019, doi: https://doi.org/10.1002/widm.1307.

  24. B. M. Patil, R. C. Joshi, and D. Toshniwal, “Association rule for classification of type -2 diabetic patients,” in ICMLC 2010 - The 2nd International Conference on Machine Learning and Computing, 2010, pp. 330–334. doi: https://doi.org/10.1109/ICMLC.2010.67.

  25. E. Kai et al., “Empowering the Healthcare Worker Using the Portable Health Clinic,” 2014 IEEE 28th International Conference on Advanced Information Networking and Applications, 2014, pp. 759–764, doi: https://doi.org/10.1109/AINA.2014.108.

  26. Maroco, J., Silva, D., Rodrigues, A., Guerreiro, M., Santana, I., de Mendonça, A.: Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests. BMC Res Notes. 17(4), 299 (2011). https://doi.org/10.1186/1756-0500-4-299.PMID:21849043;PMCID:PMC3180705

    Article  Google Scholar 

  27. P. Berkhin, “A Survey of Clustering Data Mining Techniques,” in Grouping Multidimensional Data, Berlin/Heidelberg: Springer-Verlag, pp. 25–71. doi: https://doi.org/10.1007/3-540-28349-8_2.

  28. Kotsiantis, S., Kanellopoulos, D.: Association Rules Mining: A Recent Overview. GESTS International Transactions on Computer Science and Engineering 32(1), 71–82 (2006)

    Google Scholar 

  29. Y. Liu, Institute of Electrical and Electronics Engineers, and IEEE Circuits and Systems Society, ICNC-FSKD 2017: 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery : Guilin, Guangxi, China, 29–31 July, 2017.

    Google Scholar 

  30. Abouelmehdi, K., Beni-Hessane, A., Khaloufi, H.: Big healthcare data: preserving security and privacy. Journal of Big Data 5(1), 1 (2018). https://doi.org/10.1186/s40537-017-0110-7

    Article  Google Scholar 

  31. B. Milovic, “Prediction and decision making in Health Care using Data Mining,” International Journal of Public Health Science (IJPHS), vol. 1, no. 2, Dec. 2012, doi: https://doi.org/10.11591/ijphs.v1i2.1380.

Download references

Acknowledgements

We would like to express our gratitude to the anonymous reviewers who provided critical feedback during the preparation of this manuscript. Their remarks and recommendations significantly improved the quality of this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christos Tjortjis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Michailidis, G., Vlachos-Giovanopoulos, M., Koukaras, P., Tjortjis, C. (2023). Healthcare Support Using Data Mining: A Case Study on Stroke Prediction. In: Lim, C.P., Vaidya, A., Chen, YW., Jain, V., Jain, L.C. (eds) Artificial Intelligence and Machine Learning for Healthcare. Intelligent Systems Reference Library, vol 229. Springer, Cham. https://doi.org/10.1007/978-3-031-11170-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-11170-9_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-11169-3

  • Online ISBN: 978-3-031-11170-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics