Development and validation of medical record-based logistic regression and machine learning models to diagnose diabetic retinopathy

Li, He-Yan; Dong, Li; Zhou, Wen-Da; Wu, Hao-Tian; Zhang, Rui-Heng; Li, Yi-Tong; Yu, Chu-Yao; Wei, Wen-Bin

doi:10.1007/s00417-022-05854-9

Development and validation of medical record-based logistic regression and machine learning models to diagnose diabetic retinopathy

Retinal Disorders
Published: 14 October 2022

Volume 261, pages 681–689, (2023)
Cite this article

Graefe's Archive for Clinical and Experimental Ophthalmology Aims and scope Submit manuscript

He-Yan Li¹,
Li Dong¹,
Wen-Da Zhou¹,
Hao-Tian Wu¹,
Rui-Heng Zhang¹,
Yi-Tong Li¹,
Chu-Yao Yu¹ &
…
Wen-Bin Wei ORCID: orcid.org/0000-0003-2386-0989¹

1042 Accesses
4 Citations
Explore all metrics

Abstract

Purposes

Many factors were reported to be associated with diabetic retinopathy (DR); however, their contributions remained unclear. We aimed to evaluate the prognostic and diagnostic accuracy of logistic regression and three machine learning models based on various medical records.

Methods

This was a cross-sectional study. We investigated the prevalence and associations of DR among 757 participants aged 40 years or older in the 2005–2006 National Health and Nutrition Examination Survey (NHANES). We trained the models to predict if the participants had DR with 15 predictor variables. Area under the receiver operating characteristic (AUROC) and mean squared error (MSE) of each algorithm were compared in the external validation dataset using a replicate cohort from NHANES 2007–2008.

Results

Among the 757 participants, 53 (7.00%) subjects had DR, the mean (standard deviation, SD) age was 57.7 (13.04), and 78.0% were male (n = 42). Logistic regression revealed that female gender (OR = 4.130, 95% CI: 1.820–9.380; P < 0.05), HbA1c (OR = 1.665, 95% CI: 1.197–2.317; P < 0.05), serum creatine level (OR = 2.952, 95% CI: 1.274–6.851; P < 0.05), and eGFR level (OR = 1.009, 95% CI: 1.000–1.014, P < 0.05) increased the risk of DR. The average performance obtained from internal validation was similar in all models (AUROC ≥ 0.945), and k-nearest neighbors (KNN) had the highest value with an AUROC of 0.984. In external validation, they remained robust or with modest reductions in discrimination with AUROC still ≥ 0.902, and KNN also performed the best with an AUROC of 0.982. Both logistic regression and machine learning models had good performance in the clinical diagnosis of DR.

Conclusions

This study highlights the utility of comparing traditional logistic regression to machine learning models. We found that logistic regression performed as well as optimized machine learning methods when classifying DR patients.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dynamic nomogram prediction model for diabetic retinopathy in patients with type 2 diabetes mellitus

Article Open access 28 April 2023

Predicting diabetic retinopathy and identifying interpretable biomedical features using machine learning algorithms

Article Open access 13 August 2018

Development and validation of a model that predicts the risk of diabetic retinopathy in type 2 diabetes mellitus patients

Article 26 September 2022

Data availability

Data were acquired from the National Health and Nutrition Examination Survey (https://www.cdc.gov/nchs/nhanes/).

Abbreviations

DR:: Diabetic retinopathy
Scr:: Serum creatine
eGFR:: Estimated glomerular filtration rate
NHANES:: National Health and Nutrition Examination Survey
AUROC:: Area under the receiver operating characteristic
MSE:: Mean squared error
SD:: Standard deviation
OR:: Odds ratio
KNN:: K-nearest neighbors
NPDR:: Non-proliferative diabetic retinopathy
PDR:: Proliferative diabetic retinopathy
LR:: Logistic regression
RF:: Random forest
SVM:: Support vector machine

References

Yau JWY, Rogers SL, Kawasaki R et al (2012) Global prevalence and major risk factors of diabetic retinopathy. Diabetes Care 35:556–564
Article PubMed PubMed Central Google Scholar
Cheung N, Mitchell P, Wong TY (2010) Diabetic retinopathy. Lancet 376:124–136
Article PubMed Google Scholar
Hainsworth DP, Bebu I, Aiello LP et al (2019) Risk factors for retinopathy in type 1 diabetes. Diabetes Care 42:875–882
Article CAS PubMed PubMed Central Google Scholar
Ting DSW, Cheung GCM, Wong TY (2016) Diabetic retinopathy: global prevalence, major risk factors, screening practices and public health challenges: a review. Clin Exp Ophthalmol 44:260–277
Article PubMed Google Scholar
Ting DSW, Cheung CYL, Lim G et al (2017) Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes. JAMA 318:2211–2223
Article PubMed PubMed Central Google Scholar
Lo-Ciganic WH, Huang JL, Zhang HH et al (2019) Evaluation of machine-learning algorithms for predicting opioid overdose risk among medicare beneficiaries with opioid prescriptions. JAMA Netw open 2:e190968
Article PubMed PubMed Central Google Scholar
Wong A, Young AT, Liang AS et al (2018) Development and validation of an electronic health record-based machine learning model to estimate delirium risk in newly hospitalized patients without known cognitive impairment. JAMA Netw open 1:e181018
Article PubMed PubMed Central Google Scholar
Frizzell JD, Liang L, Schulte PJ et al (2017) Prediction of 30-day all-cause readmissions in patients hospitalized for heart failure: comparison of machine learning and other statistical approaches. JAMA Cardiol 2:204–209
Article PubMed Google Scholar
Wong WL, Su X, Li X et al (2014) Global prevalence of age-related macular degeneration and disease burden projection for 2020 and 2040: a systematic review and meta-analysis. Lancet Glob Heal 2:e106–e116
Article Google Scholar
World Medical Association (2013) Ethical principles for medical research involving human subjects. JAMA 310:2191–2194
Article Google Scholar
Flaxel CJ, Adelman RA, Bailey ST et al (2020) Diabetic retinopathy preferred practice pattern. Ophthalmol 127:66–145
Article Google Scholar
Meurer WJ, Juliana T (1986) Logistic regression diagnostics understanding how well a model predicts outcomes. J Am Stat Assoc 81:461
Google Scholar
Zhang S, Li X, Zong M, Zhu X, Wang R (2017) Efficient kNN classification with different numbers of nearest neighbors. IEEE Trans Neural Networks Learn Syst 29:1774–1784
Article Google Scholar
Segev N, Harel M, Mannor S et al (2017) Learn on source, refine on target: a model transfer learning framework with random forests. IEEE Trans Pattern Anal Mach Intell 39:1811–1824
Article PubMed Google Scholar
Gu B, Sheng VS, Tay KY et al (2017) Cross validation through two-dimensional solution surface for cost-sensitive SVM. IEEE Trans Pattern Anal Mach Intell 39:1103–1121
Article PubMed Google Scholar
Kuhn M, Wing J, Weston S et al (2021) Classification and regression training. https://github.com/topepo/caret/BugReports. Accessed 1 Dec 2021
Robin, Xavier, Natacha Turck AH (2021) Display and analyze ROC curves version. http://expasy.org/tools/pROC/. Accessed 1 Dec 2021
Rogers SL, Tikellis G, Cheung N et al (2008) Retinal arteriolar caliber predicts incident retinopathy. Diabetes Care 31:761–763
Article PubMed Google Scholar
Cunha-Vaz J, Ribeiro L, Costa M et al (2017) Diabetic retinopathy phenotypes of progression to macular edema: pooled analysis from independent longitudinal studies of up to 2 years’ duration. Invest Ophthalmol Vis Sci 58:206–210
Article Google Scholar
Bearse MA, Adams AJ, Han Y et al (2006) A multifocal electroretinogram model predicting the development of diabetic retinopathy. Prog Retin Eye Res 25:425–448
Article PubMed PubMed Central Google Scholar
Blighe K, Gurudas S, Lee Y et al (2020) Diabetic retinopathy environment-wide association study (EWAS) in NHANES 2005–2008. J Clin Med 9:1–18
Article Google Scholar
Rohan TE, Frost CD, Wald NJ (1989) Prevention of blindness by screening for diabetic retinopathy: a quantitative assessment. Br Med J 299:1198–1201
Article CAS Google Scholar
Zhao Y, Singh RP (2018) The role of anti-vascular endothelial growth factor (anti-VEGF) in the management of proliferative diabetic retinopathy. Drugs Context 7:1–10
Article Google Scholar
Xu Y, Wang A, Lin X et al (2020) Global burden and gender disparity of vision loss associated with diabetes retinopathy. Acta Ophthalmol 99:431–440
Article PubMed Google Scholar
Dixon RF, Zisser H, Layne JE et al (2020) A virtual type 2 diabetes clinic using continuous glucose monitoring and endocrinology visits. J Diabetes Sci Technol 14:908–911
Article PubMed Google Scholar
Downing J, Bollyky J, Schneider J (2017) Use of a connected glucose meter and certified diabetes educator coaching to decrease the likelihood of abnormal blood glucose excursions: the livongo for diabetes program. J Med Internet Res 19:2017
Article Google Scholar

Download references

Funding

This study was supported by the National Natural Science Foundation of China (82220108017, 82141128); The Capital Health Research and Development of Special (2020–1-2052); Science & Technology Project of Beijing Municipal Science & Technology Commission (Z201100005520045, Z181100001818003).

Author information

Authors and Affiliations

Beijing Tongren Eye Center, Beijing Key Laboratory of Intraocular Tumor Diagnosis and Treatment, Beijing Ophthalmology & Visual Sciences Key Lab, Medical Artificial Intelligence Research and Verification Key Laboratory of the Ministry of Industry and Information Technology, Beijing Tongren Hospital, Capital Medical University, 1 Dong Jiao Min Lane, Beijing, 100730, China
He-Yan Li, Li Dong, Wen-Da Zhou, Hao-Tian Wu, Rui-Heng Zhang, Yi-Tong Li, Chu-Yao Yu & Wen-Bin Wei

Authors

He-Yan Li
View author publications
You can also search for this author in PubMed Google Scholar
Li Dong
View author publications
You can also search for this author in PubMed Google Scholar
Wen-Da Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Hao-Tian Wu
View author publications
You can also search for this author in PubMed Google Scholar
Rui-Heng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yi-Tong Li
View author publications
You can also search for this author in PubMed Google Scholar
Chu-Yao Yu
View author publications
You can also search for this author in PubMed Google Scholar
Wen-Bin Wei
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

W.B. Wei, H.Y Li, and L. Dong designed the study. H.Y Li, L. Dong, and R.H Zhang wrote the manuscript. H.Y Li, C.Y Yu, W.D Zhou, H.T Wu, and Y.T Li collected the data and conducted the analyses. W.B. Wei edited and revised the manuscript. All authors have approved the submitted version and agreed with the contributions declarations.

Corresponding author

Correspondence to Wen-Bin Wei.

Ethics declarations

Ethics approval and consent to participate

Ethics approval and informed consent were not required for this study because of public accessibility to the data.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PNG 70 KB)

Supplementary file2 (PNG 58 KB)

Supplementary file3 (PNG 49 KB)

Supplementary file4 (DOCX 15 KB)

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, HY., Dong, L., Zhou, WD. et al. Development and validation of medical record-based logistic regression and machine learning models to diagnose diabetic retinopathy. Graefes Arch Clin Exp Ophthalmol 261, 681–689 (2023). https://doi.org/10.1007/s00417-022-05854-9

Download citation

Received: 01 July 2022
Revised: 08 September 2022
Accepted: 30 September 2022
Published: 14 October 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s00417-022-05854-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Development and validation of medical record-based logistic regression and machine learning models to diagnose diabetic retinopathy