ABSTRACT
Introduction Serial functional status assessments are critical to heart failure (HF) management but are often described narratively in documentation, limiting their use in quality improvement or patient selection for clinical trials. We developed and validated a deep learning-based natural language processing (NLP) strategy to extract functional status assessments from unstructured clinical notes.
Methods We identified 26,577 HF patients across outpatient services at Yale New Haven Hospital (YNHH), Greenwich Hospital (GH), and Northeast Medical Group (NMG) (mean age 76.1 years; 52.0% women). We used expert annotated notes from YNHH for model development/internal testing and from GH and NMG for external validation. The primary outcomes were NLP models to detect (a) explicit New York Heart Association (NYHA) classification, (b) HF symptoms during activity or rest, and (c) functional status assessment frequency.
Results Among 3,000 expert-annotated notes, 13.6% mentioned NYHA class, and 26.5% described HF symptoms. The model to detect NYHA classes achieved a class-weighted AUROC of 0.99 (95% CI: 0.98-1.00) at YNHH, 0.98 (0.96-1.00) at NMG, and 0.98 (0.92-1.00) at GH. The activity-related HF symptom model achieved an AUROC of 0.94 (0.89-0.98) at YNHH, 0.94 (0.91-0.97) at NMG, and 0.95 (0.92-0.99) at GH. Deploying the NYHA model among 166,655 unannotated notes from YNHH identified 21,528 (12.9%) with NYHA mentions and 17,642 encounters (10.5%) classifiable into functional status groups based on activity-related symptoms.
Conclusions We developed and validated an NLP approach to extract NYHA classification and activity-related HF symptoms from clinical notes, enhancing the ability to track optimal care and identify trial-eligible patients.
Competing Interest Statement
Dr. Krumholz reported receiving expenses and/or personal fees from UnitedHealthcare, Element Science, Inc, Aetna Inc, Reality Labs, Tesseract/4 Catalyst, F-Prime, the Siegfried & Jensen law firm, the Arnold & Porter law firm, and the Martin Baughman, PLLC; being a cofounder of Refactor Health and Hugo Health; and being associated with contracts through Yale New Haven Hospital from the Centers for Medicare & Medicaid Services and through Yale University from Johnson & Johnson outside the submitted work. Dr. Khera is an Associate Editor of JAMA. He also receives research support, through Yale, from Bristol-Myers Squibb, Novo Nordisk, and BridgeBio. He is a coinventor of U.S. Pending Patent Applications 63/562,335, 63/177,117, 63/428,569, 63/346,610, 63/484,426, 63/508,315, and 63/606,203. He is a co-founder of Ensight-AI, Inc. and Evidence2Health, health platforms to improve cardiovascular diagnosis and evidence-based cardiovascular care.
Funding Statement
Dr. Khera was supported by the National Heart, Lung, and Blood Institute of the National Institutes of Health (under awards R01HL167858 and K23HL153775) and the Doris Duke Charitable Foundation (under award 2022060).
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The Yale Institutional Review Board reviewed the study, which approved the protocol and waived the need for informed consent, as it represents a secondary analysis of existing data.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
The data cannot be publicly shared as it represents protected health information and sharing data will be a violation of patient privacy.