Abstract
Data and information have become a valuable asset for small and big organizations in the past few decades. Data is the main ingredient for strategic decision-making, which could give businesses a significant advantage over their competitors, by providing customized services or overall experience to their customers and attracting new ones. For this purpose, data mining techniques are being utilized so that valuable information can be discovered and exploited. There is a vast amount of data generated in the field of healthcare that is not getting fully exploited by traditional methods, for reasons, such as their complexity, velocity, and volume. Therefore, there is a demand for the development of powerful automated data mining tools for the complete utilization of these data, and the uncovering of patterns and precious knowledge about patients, medical claims, treatment costs, hospitals, etc. This work focuses on exploiting the best-known data mining techniques: classification, clustering, and association rule mining, which are utilized extensively in the healthcare industry for incident prediction and general medical knowledge acquisition. The data mining process comprises several steps, such as data selection, pre-processing, transformation, interpretation, and evaluation. The section of the experimentation includes a stroke incidents dataset fetched from the Kaggle dataset provider. This chapter also provides a literature survey on data mining applications in the healthcare sector, while discussing the abovementioned machine learning concepts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Abbreviations
- Term:
-
Definition
- Bmi:
-
Body Mass Index
- CRISP-DM:
-
Cross-Industry Standard Process for Data Mining
- CRM:
-
Customer Relationship Management
- Data mining algorithm:
-
A set of heuristics and calculations that creates a model from data
- DBSCAN:
-
Density-Based Spatial Clustering of Applications with Noise
- Feature:
-
An independent variable that can be used as input of a machine learning model
- FP-Growth:
-
Frequent Pattern Growth
- GDPR:
-
General Data Protection Regulation
- LEADERS:
-
Lightweight Epidemiological Advanced Detection Emergency Response System
- Machine learning model:
-
An expression of an algorithm that processes data to explore patterns or make predictions
- OLAP:
-
Online Analytical Processing
- Operator:
-
In RapidMiner, a group of functions that perform actions on input data through parameters and outputs the results of said actions
- Parameter:
-
A special kind of variable used in an algorithm to refer to one of the pieces of data provided as input to it
- Process:
-
In RapidMiner, it is a set of sequentially connected operators represented by a flow design, where each operator provides its output as input to the next one
- ROC:
-
Receiver Operating Characteristic
- SEMMA:
-
Sample, explore, modify, model, assess
- SVM:
-
Support Vector Machine
References
Koukaras, P., Tjortjis, C., Rousidis, D.: Social Media Types: introducing a data driven taxonomy. Computing 102(1), 295–340 (2020). https://doi.org/10.1007/s00607-019-00739-y
Baitharu, T.R., Pani, S.K.: Analysis of Data Mining Techniques for Healthcare Decision Support System Using Liver Disorder Dataset. Procedia Computer Science 85, 862–870 (2016). https://doi.org/10.1016/j.procs.2016.05.276
Tjortjis, C., Saraee, M., Theodoulidis, B., Keane, J.A.: Using T3, an Improved Decision Tree Classifier, for Mining Stroke-related Medical Data. Methods Inf. Med. 46(05), 523–529 (2007). https://doi.org/10.1160/ME0317
Koh HC, Tan G. “Data mining applications in healthcare”, J Healthc Inf Manag, 2005 Spring;19(2):64–72. PMID: 15869215.
M. H. Tekieh and B. Raahemi, “Importance of data mining in healthcare: A survey,” in Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2015, Aug. 2015, pp. 1057–1062. doi: https://doi.org/10.1145/2808797.2809367.
Zhang, S., Tjortjis, C., Zeng, X., Qiao, H., Buchan, I., Keane, J.: Comparing Data Mining Methods with Logistic Regression in Childhood Obesity Prediction. Inf. Syst. Front. 11(4), 449–460 (2009). https://doi.org/10.1007/s10796-009-9157-0
Glover, S., Rivers, P.A., Asoh, D.A., Piper, C.N., Murph, K.: Data mining for health executive decision support: An imperative with a daunting future! Health Serv. Manage. Res. 23(1), 42–46 (2010). https://doi.org/10.1258/hsmr.2009.009029
Tomar, D., Agarwal, S.: A survey on data mining approaches for healthcare. International Journal of Bio-Science and Bio-Technology 5(5), 241–266 (2013). https://doi.org/10.14257/ijbsbt.2013.5.5.25
Obenshain, M.K.: Application of Data Mining Techniques to Healthcare Data. Infect. Control Hosp. Epidemiol. 25(8), 690–695 (2004). https://doi.org/10.1086/502460
T. Chatzinikolaou, E. Vogiazti, A. Kousis, and C. Tjortjis, “Smart Healthcare Support Using Data Mining and Machine Learning,” in EAI/Springer Innovations in Communication and Computing Book: “IoT and WSN based SmartCities: A Machine Learning Perspective,” 2022.
P. Koukaras, D. Rousidis and C. Tjortjis, “Forecasting and Prevention Mechanisms Using Social Media in Health Care”, in Maglogiannis I., Brahnam S., Jain L. (eds) Advanced Computational Intelligence in Healthcare-7. Studies in Computational Intelligence, vol 891, March 2020, Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-61114-2_8.
S. El-Sappagh, S. El-Masri, M. Elmogy, S. H. El-Sappagh, and A. M. Riad, “Data Mining and Knowledge Discovery: Applications, Techniques, Challenges and Process Models in Healthcare,” International Journal of Engineering Research and Applications (IJERA), vol. 3, no. 3, pp. 900–906, May 2013, [Online]. Available: https://www.researchgate.net/publication/250612388.
P. Koukaras, C. Berberidis and C. Tjortjis, “A Semi-supervised Learning Approach for Complex Information Networks”, in Hemanth J., Bestak R., Chen J.IZ. (eds) Intelligent Data Communication Technologies and Internet of Things. Lecture Notes on Data Engineering and Communications Technologies, vol 57, February 2021, Springer, Singapore. https://doi.org/10.1007/978-981-15-9509-7_1.
Ahmad, P., Qamar, S., Qasim, S., Rizvi, A.: Techniques of Data Mining In Healthcare: A Review. International Journal of Computer Applications 120(15), 38–50 (2015). https://doi.org/10.5120/21307-4126
Tzirakis, P., Tjortjis, C.: T3C: improving a decision tree classification algorithm’s interval splits on continuous attributes. Adv. Data Anal. Classif. 11(2), 353–370 (2017). https://doi.org/10.1007/s11634-016-0246-x
Das, R., Turkoglu, I., Sengur, A.: Effective diagnosis of heart disease through neural networks ensembles. Expert Syst. Appl. 36(4), 7675–7680 (2009). https://doi.org/10.1016/j.eswa.2008.09.013
D. I. Curiac, G. Vasile, O. Banias, C. Volosencu, and A. Albu, “Bayesian network model for diagnosis of psychiatric diseases,” in Proceedings of the International Conference on Information Technology Interfaces, ITI, 2009, pp. 61–66. doi: https://doi.org/10.1109/ITI.2009.5196055.
Divya, D., Agarwal, S.: Weighted support vector regression approach for remote healthcare monitoring. International Conference on Recent Trends in Information Technology, ICRTIT 2011, 969–974 (2011). https://doi.org/10.1109/ICRTIT.2011.5972437
J. Alapont, A. Bella-Sanjuán, C. Ferri, J. Hernández-Orallo, J. D. Llopis-Llopis, and M. J. Ramírez-Quintana, “Specialised Tools for Automating Data Mining for Hospital Management,” in Proc. First East European Conference on Health Care Modelling and Computation, Aug. 2005, pp. 7–19.
Kanellopoulos, Y., Antonellis, P., Tjortjis, C., Makris, C., Tsirakis, N.: k-Attractors: A Partitional Clustering Algorithm for Numeric Data Analysis. Appl. Artif. Intell. 25(2), 97–115 (2011). https://doi.org/10.1080/08839514.2011.534590
Bertsimas, D., et al.: Algorithmic prediction of health-care costs. Oper. Res. 56(6), 1382–1392 (2008). https://doi.org/10.1287/opre.1080.0619
Y. Peng, G. Kou, A. Sabatka, Z. Chen, D. Khazanchi, and Y. Shi, “Application of Clustering Methods to Health Insurance Fraud Detection,” in 2006 International Conference on Service Systems and Service Management, Oct. 2006, pp. 116–120. doi: https://doi.org/10.1109/ICSSSM.2006.320598.
S. M. Ghafari and C. Tjortjis, “A survey on association rules mining using heuristics,” WIREs Data Mining and Knowledge Discovery, vol. 9, no. 4, Jul. 2019, doi: https://doi.org/10.1002/widm.1307.
B. M. Patil, R. C. Joshi, and D. Toshniwal, “Association rule for classification of type -2 diabetic patients,” in ICMLC 2010 - The 2nd International Conference on Machine Learning and Computing, 2010, pp. 330–334. doi: https://doi.org/10.1109/ICMLC.2010.67.
E. Kai et al., “Empowering the Healthcare Worker Using the Portable Health Clinic,” 2014 IEEE 28th International Conference on Advanced Information Networking and Applications, 2014, pp. 759–764, doi: https://doi.org/10.1109/AINA.2014.108.
Maroco, J., Silva, D., Rodrigues, A., Guerreiro, M., Santana, I., de Mendonça, A.: Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests. BMC Res Notes. 17(4), 299 (2011). https://doi.org/10.1186/1756-0500-4-299.PMID:21849043;PMCID:PMC3180705
P. Berkhin, “A Survey of Clustering Data Mining Techniques,” in Grouping Multidimensional Data, Berlin/Heidelberg: Springer-Verlag, pp. 25–71. doi: https://doi.org/10.1007/3-540-28349-8_2.
Kotsiantis, S., Kanellopoulos, D.: Association Rules Mining: A Recent Overview. GESTS International Transactions on Computer Science and Engineering 32(1), 71–82 (2006)
Y. Liu, Institute of Electrical and Electronics Engineers, and IEEE Circuits and Systems Society, ICNC-FSKD 2017: 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery : Guilin, Guangxi, China, 29–31 July, 2017.
Abouelmehdi, K., Beni-Hessane, A., Khaloufi, H.: Big healthcare data: preserving security and privacy. Journal of Big Data 5(1), 1 (2018). https://doi.org/10.1186/s40537-017-0110-7
B. Milovic, “Prediction and decision making in Health Care using Data Mining,” International Journal of Public Health Science (IJPHS), vol. 1, no. 2, Dec. 2012, doi: https://doi.org/10.11591/ijphs.v1i2.1380.
Acknowledgements
We would like to express our gratitude to the anonymous reviewers who provided critical feedback during the preparation of this manuscript. Their remarks and recommendations significantly improved the quality of this work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Michailidis, G., Vlachos-Giovanopoulos, M., Koukaras, P., Tjortjis, C. (2023). Healthcare Support Using Data Mining: A Case Study on Stroke Prediction. In: Lim, C.P., Vaidya, A., Chen, YW., Jain, V., Jain, L.C. (eds) Artificial Intelligence and Machine Learning for Healthcare. Intelligent Systems Reference Library, vol 229. Springer, Cham. https://doi.org/10.1007/978-3-031-11170-9_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-11170-9_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-11169-3
Online ISBN: 978-3-031-11170-9
eBook Packages: EngineeringEngineering (R0)