ABSTRACT
The application of machine learning (ML) tools in electronic health records (EHRs) can help reduce the underdiagnosis of dementia, but models that are not designed to reflect minority population may perpetuate that underdiagnosis. To address the underdiagnosis of dementia in both Black Americans (BAs) and white Americans (WAs), we sought to develop and validate ML models that assign race-specific risk scores. These scores were used to identify undiagnosed dementia in BA and WA Veterans in EHRs. More specifically, risk scores were generated separately for BAs (n=10K) and WAs (n=10K) in training samples of cases and controls by performing ML, equivalence mapping, topic modeling, and a support vector-machine (SVM) in structured and unstructured EHR data. Scores were validated via blinded manual chart reviews (n=1.2K) of controls from a separate sample (n=20K). AUCs and negative and positive predictive values (NPVs and PPVs) were calculated to evaluate the models. There was a strong positive relationship between SVM-generated risk scores and undiagnosed dementia. BAs were more likely than WAs to have undiagnosed dementia per chart review, both overall (15.3% vs 9.5%) and among Veterans with >90th percentile cutoff scores (25.6% vs 15.3%). With chart reviews as the reference standard and varied cutoff scores, the BA model performed slightly better than the WA model (AUC=0.86 with NPV=0.98 and PPV=0.26 at >90th percentile cutoff vs AUC=0.77 with NPV=0.98 and PPV=0.15 at >90th). The AUCs, NPVs, and PPVs suggest that race-specific ML models can assist in the identification of undiagnosed dementia, particularly in BAs. Future studies should investigate implementing EHR-based risk scores in clinics that serve both BA and WA Veterans.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This work was supported by NIA R56 AG059739 and was supported in part by the U. S. Department of Veterans Affairs Office of Research and Development Biomedical Laboratory Research Program.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The IRBs of the VA Puget Sound Health Care System and Washington VA Medical Center gave ethical approval for this work.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
AUTHOR NOTE
Current author affiliations were updated, and Tables 2 and 4 received minor updates.
Data Availability
The VA EHR data resides in VINCI behind VA firewalls. VA-approved investigators can access the data. SVM algorithms can be made available to interested qualified investigators.