Machine Learning-Based Early Diabetes Prediction

James, Deepa Elizabeth; Vimina, E. R.

doi:10.1007/978-981-16-2422-3_52

Deepa Elizabeth James¹³ &
E. R. Vimina¹³

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 213))

897 Accesses

Abstract

There are several diseases that the world faces presently and a critical one is Diabetes mellitus. The current diagnostic practice involves various tests at a lab or a hospital and a treatment based on the outcome of the diagnosis. This study proposes a machine learning model to classify a patient as diabetic or not, utilizing the popular PIMA Indian Dataset. The dataset contains features like Pregnancy, Blood Pressure, Skin Thickness, Age and Diabetes Pedigree Function along with regular factors like Glucose, BMI and Insulin. The objective of this study is to make use of several pre-processing techniques resulting in improved accuracy over simple models. The study compares different classification models namely GaussianNB, Logistic Regression, KNN, Decision Tree Classifier, Random Forest Classifier, Gradient Boosting Classifier in several ways. Initially, missing values in the significant features are replaced by computing median of the input variables based on the outcome of whether the patient is diabetic or not. After this, feature engineering is performed by adding new features which are obtained by categorizing the existing features based on its range. Finally, Hyperparameter tuning is carried out to optimize the model. Performance metrics such as Accuracy and area under the ROC Curve (AUC) is used to validate the effectiveness of the proposed framework. Results indicate that XGBoosting Classifier is concluded as the optimum model with 88% accuracy and AUC value of 0.948. The performance of the model is evaluated using Confusion Matrix and ROC Curve.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Softcover Book: USD 279.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Machine learning-based early detection of diabetes risk factors for improved health management

Article 02 April 2024

Model Accuracy Test for Early Stage of Diabetes Risk Prediction with Data Science Approach

Empirical Machine Learning Algorithm for Diabetic Prediction

References

Cho, N.H., Shaw, J.E., Karuranga, S., Huang, Y., da Rocha Fernandes, J.D., Ohlrogge, A.W., Malanda, B.: IDF diabetes atlas: global estimates of diabetes prevalence for 2017 and projections for 2045. Diabet. Res. Clin. Pract. 138, 271–281 (2018). https://doi.org/10.1016/j.diabres.2018.02.023
Saeedi, P., Petersohn, I., Salpea, P., Malanda, B., Karuranga, S., Unwin, N., Colagiuri, S., Guariguata, L., Motala, A. A., Ogurtsova, K., Shaw, J. E., Bright, D., Williams, R., IDF Diabetes Atlas Committee: Global and regional diabetes prevalence estimates for 2019 and projections for 2030 and 2045: Results from the International Diabetes Federation Diabetes Atlas, 9th edn. Diabet. Res. Clin. Pract. 157, 107843 (2019). https://doi.org/10.1016/j.diabres.2019.107843
Maniruzzaman, M., Kumar, N., Menhazul Abedin, M., et al.: Comparative approaches for classification of diabetes mellitus data: Machine learning paradigm. Comput. Methods Progr. Biomed. 152, 23–34 (2017). https://doi.org/10.1016/j.cmpb.2017.09.004
Komi, M., Li, J., Zhai, Y., Zhang, X.: Application of data mining methods in diabetes prediction. In: 2017 2nd International Conference on Image, Vision and Computing (ICIVC), Chengdu, pp. 1006–1010 (2017). https://doi.org/10.1109/ICIVC.2017.7984706
Mercaldo, F., Nardone, V., Santone, A.: Diabetes mellitus affected patients classification and diagnosis through machine learning techniques. Procedia Comput. Sci. 112, 2519–2528 (2017). https://doi.org/10.1016/j.procs.2017.08.193
Sisodia, D., Sisodia, D. S.: Prediction of diabetes using classification algorithms. Procedia Comput. Sci. 132, 1578–1585. Elsevier B.V(2018). https://doi.org/10.1016/j.procs.2018.05.122
Hasan Md, A., Md. Ashraful, Das, D., Hossain, E., Hasan, M.: Diabetes prediction using ensembling of different machine learning classifiers. IEEE Access, 1–1 (2020) https://doi.org/10.1109/ACCESS.2020.2989857
Alehegn, M., Raghvendra Joshi, R., Mulay, R.: Diabetes analysis and prediction using random forest, KNN, Naïve Bayes, And J48: an ensemble approach. Int. J. Sci. Technol. Res. 8, 09 (2019)
Google Scholar
Sneha, N., Gangil, T.: Analysis of diabetes mellitus for early prediction using optimal features selection. J. Big Data 6, 13 (2019). https://doi.org/10.1186/s40537-019-0175-6
Article Google Scholar
Hina, S., Shaikh, A., Sattar, S.A.: Analyzing diabetes datasets using data mining. J. Basic Appl. Sci. 13, 466–471 (2017)
Google Scholar
Asuero, A.G., Sayago, A., Gonzalez, A.: The correlation coefficient: an overview, Crit. Rev. Anal. Chem. 36, 41–59 (2006)
Google Scholar
Markovitch, S., Rosenstein, D.: Feature generation using general constructor functions. Mach. Learn. 49, 59–98 (2002). https://doi.org/10.1023/A:1014046307775
Article MATH Google Scholar
Ünsal, Ö., Bulbul, H.: Comparison of classification techniques used in machine learning as applied on vocational guidance data. In: International Conference on Machine Learning and Applications, vol. 10 (2011)
Google Scholar
Zeng, X., Martinez, T.R.: Distribution-balanced stratified cross validation for accuracy estimation. J. Exp. Theor. Artif. Intell. 12, 1–12 (2000)
Google Scholar
Mitchell, T.M., et al.: Machine Learning, vol. 45.37. McGraw Hill, Burr Ridge, IL, pp. 870–877 (1997)
Google Scholar
Madjarov, G., Kocev, D., Gjorgjevikj, D., Džeroski, S.: An extensive experimental comparison of methods for multi-label learning. Pattern Recogn. 45, 3084–3104, ISSN 0031-3203 (2012). https://doi.org/10.1016/j.patcog.2012.03.004
Peng, C.-Y.J., Lee, K.L., Ingersoll, G.M.: An introduction to logistic regression analysis and reporting. J. Educ. Res. 96, 3–14 (2002). https://doi.org/10.1080/00220670209598786
Google Scholar
Cheng, W., Hüllermeier, E.: Combining instance-based learning and logistic regression for multilabel classification. Mach. Learn. 76, 211–225 (2009). https://doi.org/10.1007/s10994-009-5127-5
Google Scholar
Özkan, Y.: Data Mining Methods. Papatya Publications, Istanbul, Turkey (2008)
Google Scholar
Raj, J.S.: A novel information processing in IoT based real time health care monitoring system. J. Electron. 2(3), 188–196 (2020)
Google Scholar
Raj, J.S., Ananthi, J.V: Recurrent neural networks and nonlinear prediction in support vector machines. J. Soft Comput. Paradigm (JSCP) 1(1), 33–40 (2019)
Google Scholar
Ross Quinlan, J.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1993)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001). https://doi.org/10.1023/A:1010933404324
Article MATH Google Scholar
Liaw, A., Wiener, M.: Classification and regression by random forest. R news 2, 18–22 (2002)
Google Scholar
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.A.: Review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. 42(4), 463–484 (2012). https://doi.org/10.1109/TSMCC.2011.2161285
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ‘16). Association for Computing Machinery, New York, NY, USA, 785–794 (2016). https://doi.org/10.1145/2939672.2939785
Melo, F.: Area under the ROC Curve. In:Dubitzky, W., Wolkenhauer, O., Cho, K.H., Yokota, H. (eds.) Encyclopedia of Systems Biology. Springer, New York (2013). https://doi.org/10.1007/978-1-4419-9863-7_209

Download references

Author information

Authors and Affiliations

Amrita School of Arts and Science, Amrita Vishwa Vidyapeetham, Kochi Campus, Kochi, India
Deepa Elizabeth James & E. R. Vimina

Authors

Deepa Elizabeth James
View author publications
You can also search for this author in PubMed Google Scholar
E. R. Vimina
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Deepa Elizabeth James .

Editor information

Editors and Affiliations

Gnanmani College of Engineering and Technology, Namakkal, Tamil Nadu, India
Jennifer S. Raj
Department of Business Administration, The Gerald Schwartz, School of Business, St. Francis Xavier University, Nova Scotia, NS, Canada
Ram Palanisamy
Department of Computer Engineering and Informatics, University of Patras, Patra, Greece
Isidoros Perikos
Department of Computer Science, Kennesaw State University, Kennesaw, GA, USA
Yong Shi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

James, D.E., Vimina, E.R. (2022). Machine Learning-Based Early Diabetes Prediction. In: Raj, J.S., Palanisamy, R., Perikos, I., Shi, Y. (eds) Intelligent Sustainable Systems. Lecture Notes in Networks and Systems, vol 213. Springer, Singapore. https://doi.org/10.1007/978-981-16-2422-3_52

Download citation

DOI: https://doi.org/10.1007/978-981-16-2422-3_52
Published: 27 August 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-2421-6
Online ISBN: 978-981-16-2422-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics