Skip to main content

Advertisement

Log in

Development and validation of medical record-based logistic regression and machine learning models to diagnose diabetic retinopathy

  • Retinal Disorders
  • Published:
Graefe's Archive for Clinical and Experimental Ophthalmology Aims and scope Submit manuscript

Abstract

Purposes

Many factors were reported to be associated with diabetic retinopathy (DR); however, their contributions remained unclear. We aimed to evaluate the prognostic and diagnostic accuracy of logistic regression and three machine learning models based on various medical records.

Methods

This was a cross-sectional study. We investigated the prevalence and associations of DR among 757 participants aged 40 years or older in the 2005–2006 National Health and Nutrition Examination Survey (NHANES). We trained the models to predict if the participants had DR with 15 predictor variables. Area under the receiver operating characteristic (AUROC) and mean squared error (MSE) of each algorithm were compared in the external validation dataset using a replicate cohort from NHANES 2007–2008.

Results

Among the 757 participants, 53 (7.00%) subjects had DR, the mean (standard deviation, SD) age was 57.7 (13.04), and 78.0% were male (n = 42). Logistic regression revealed that female gender (OR = 4.130, 95% CI: 1.820–9.380; P < 0.05), HbA1c (OR = 1.665, 95% CI: 1.197–2.317; P < 0.05), serum creatine level (OR = 2.952, 95% CI: 1.274–6.851; P < 0.05), and eGFR level (OR = 1.009, 95% CI: 1.000–1.014, P < 0.05) increased the risk of DR. The average performance obtained from internal validation was similar in all models (AUROC ≥ 0.945), and k-nearest neighbors (KNN) had the highest value with an AUROC of 0.984. In external validation, they remained robust or with modest reductions in discrimination with AUROC still ≥ 0.902, and KNN also performed the best with an AUROC of 0.982. Both logistic regression and machine learning models had good performance in the clinical diagnosis of DR.

Conclusions

This study highlights the utility of comparing traditional logistic regression to machine learning models. We found that logistic regression performed as well as optimized machine learning methods when classifying DR patients.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

Data were acquired from the National Health and Nutrition Examination Survey (https://www.cdc.gov/nchs/nhanes/).

Abbreviations

DR:

Diabetic retinopathy

Scr:

Serum creatine

eGFR:

Estimated glomerular filtration rate

NHANES:

National Health and Nutrition Examination Survey

AUROC:

Area under the receiver operating characteristic

MSE:

Mean squared error

SD:

Standard deviation

OR:

Odds ratio

KNN:

K-nearest neighbors

NPDR:

Non-proliferative diabetic retinopathy

PDR:

Proliferative diabetic retinopathy

LR:

Logistic regression

RF:

Random forest

SVM:

Support vector machine

References

  1. Yau JWY, Rogers SL, Kawasaki R et al (2012) Global prevalence and major risk factors of diabetic retinopathy. Diabetes Care 35:556–564

    Article  PubMed  PubMed Central  Google Scholar 

  2. Cheung N, Mitchell P, Wong TY (2010) Diabetic retinopathy. Lancet 376:124–136

    Article  PubMed  Google Scholar 

  3. Hainsworth DP, Bebu I, Aiello LP et al (2019) Risk factors for retinopathy in type 1 diabetes. Diabetes Care 42:875–882

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Ting DSW, Cheung GCM, Wong TY (2016) Diabetic retinopathy: global prevalence, major risk factors, screening practices and public health challenges: a review. Clin Exp Ophthalmol 44:260–277

    Article  PubMed  Google Scholar 

  5. Ting DSW, Cheung CYL, Lim G et al (2017) Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes. JAMA 318:2211–2223

    Article  PubMed  PubMed Central  Google Scholar 

  6. Lo-Ciganic WH, Huang JL, Zhang HH et al (2019) Evaluation of machine-learning algorithms for predicting opioid overdose risk among medicare beneficiaries with opioid prescriptions. JAMA Netw open 2:e190968

    Article  PubMed  PubMed Central  Google Scholar 

  7. Wong A, Young AT, Liang AS et al (2018) Development and validation of an electronic health record-based machine learning model to estimate delirium risk in newly hospitalized patients without known cognitive impairment. JAMA Netw open 1:e181018

    Article  PubMed  PubMed Central  Google Scholar 

  8. Frizzell JD, Liang L, Schulte PJ et al (2017) Prediction of 30-day all-cause readmissions in patients hospitalized for heart failure: comparison of machine learning and other statistical approaches. JAMA Cardiol 2:204–209

    Article  PubMed  Google Scholar 

  9. Wong WL, Su X, Li X et al (2014) Global prevalence of age-related macular degeneration and disease burden projection for 2020 and 2040: a systematic review and meta-analysis. Lancet Glob Heal 2:e106–e116

    Article  Google Scholar 

  10. World Medical Association (2013) Ethical principles for medical research involving human subjects. JAMA 310:2191–2194

    Article  Google Scholar 

  11. Flaxel CJ, Adelman RA, Bailey ST et al (2020) Diabetic retinopathy preferred practice pattern. Ophthalmol 127:66–145

    Article  Google Scholar 

  12. Meurer WJ, Juliana T (1986) Logistic regression diagnostics understanding how well a model predicts outcomes. J Am Stat Assoc 81:461

    Google Scholar 

  13. Zhang S, Li X, Zong M, Zhu X, Wang R (2017) Efficient kNN classification with different numbers of nearest neighbors. IEEE Trans Neural Networks Learn Syst 29:1774–1784

    Article  Google Scholar 

  14. Segev N, Harel M, Mannor S et al (2017) Learn on source, refine on target: a model transfer learning framework with random forests. IEEE Trans Pattern Anal Mach Intell 39:1811–1824

    Article  PubMed  Google Scholar 

  15. Gu B, Sheng VS, Tay KY et al (2017) Cross validation through two-dimensional solution surface for cost-sensitive SVM. IEEE Trans Pattern Anal Mach Intell 39:1103–1121

    Article  PubMed  Google Scholar 

  16. Kuhn M, Wing J, Weston S et al (2021) Classification and regression training. https://github.com/topepo/caret/BugReports. Accessed 1 Dec 2021

  17. Robin, Xavier, Natacha Turck AH (2021) Display and analyze ROC curves version. http://expasy.org/tools/pROC/. Accessed 1 Dec 2021

  18. Rogers SL, Tikellis G, Cheung N et al (2008) Retinal arteriolar caliber predicts incident retinopathy. Diabetes Care 31:761–763

    Article  PubMed  Google Scholar 

  19. Cunha-Vaz J, Ribeiro L, Costa M et al (2017) Diabetic retinopathy phenotypes of progression to macular edema: pooled analysis from independent longitudinal studies of up to 2 years’ duration. Invest Ophthalmol Vis Sci 58:206–210

    Article  Google Scholar 

  20. Bearse MA, Adams AJ, Han Y et al (2006) A multifocal electroretinogram model predicting the development of diabetic retinopathy. Prog Retin Eye Res 25:425–448

    Article  PubMed  PubMed Central  Google Scholar 

  21. Blighe K, Gurudas S, Lee Y et al (2020) Diabetic retinopathy environment-wide association study (EWAS) in NHANES 2005–2008. J Clin Med 9:1–18

    Article  Google Scholar 

  22. Rohan TE, Frost CD, Wald NJ (1989) Prevention of blindness by screening for diabetic retinopathy: a quantitative assessment. Br Med J 299:1198–1201

    Article  CAS  Google Scholar 

  23. Zhao Y, Singh RP (2018) The role of anti-vascular endothelial growth factor (anti-VEGF) in the management of proliferative diabetic retinopathy. Drugs Context 7:1–10

    Article  Google Scholar 

  24. Xu Y, Wang A, Lin X et al (2020) Global burden and gender disparity of vision loss associated with diabetes retinopathy. Acta Ophthalmol 99:431–440

    Article  PubMed  Google Scholar 

  25. Dixon RF, Zisser H, Layne JE et al (2020) A virtual type 2 diabetes clinic using continuous glucose monitoring and endocrinology visits. J Diabetes Sci Technol 14:908–911

    Article  PubMed  Google Scholar 

  26. Downing J, Bollyky J, Schneider J (2017) Use of a connected glucose meter and certified diabetes educator coaching to decrease the likelihood of abnormal blood glucose excursions: the livongo for diabetes program. J Med Internet Res 19:2017

    Article  Google Scholar 

Download references

Funding

This study was supported by the National Natural Science Foundation of China (82220108017, 82141128); The Capital Health Research and Development of Special (2020–1-2052); Science & Technology Project of Beijing Municipal Science & Technology Commission (Z201100005520045, Z181100001818003).

Author information

Authors and Affiliations

Authors

Contributions

W.B. Wei, H.Y Li, and L. Dong designed the study. H.Y Li, L. Dong, and R.H Zhang wrote the manuscript. H.Y Li, C.Y Yu, W.D Zhou, H.T Wu, and Y.T Li collected the data and conducted the analyses. W.B. Wei edited and revised the manuscript. All authors have approved the submitted version and agreed with the contributions declarations.

Corresponding author

Correspondence to Wen-Bin Wei.

Ethics declarations

Ethics approval and consent to participate

Ethics approval and informed consent were not required for this study because of public accessibility to the data.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, HY., Dong, L., Zhou, WD. et al. Development and validation of medical record-based logistic regression and machine learning models to diagnose diabetic retinopathy. Graefes Arch Clin Exp Ophthalmol 261, 681–689 (2023). https://doi.org/10.1007/s00417-022-05854-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00417-022-05854-9

Keywords

Navigation