Accuracy of using natural language processing methods for identifying healthcare-associated infections

https://doi.org/10.1016/j.ijmedinf.2018.06.002Get rights and content

Highlights

  • This study confirmed the feasibility of using natural language processing for detection of healthcare-associated infection.

  • The performances of HAI detection were satisfactory: overall sensitivity of 83.9% and specificity of 84.2%.

  • The most frequent cause of medical record misclassification was an inaccurate temporal labeling of medical events.

  • Improvements in semantic analysis algorithms would increase the detection performance.

  • NLP could offer a new standardized case-finding process for the HAI monitoring in hospitals.

Abstract

Objective

There is a growing interest in using natural language processing (NLP) for healthcare-associated infections (HAIs) monitoring. A French project consortium, SYNODOS, developed a NLP solution for detecting medical events in electronic medical records for epidemiological purposes. The objective of this study was to evaluate the performance of the SYNODOS data processing chain for detecting HAIs in clinical documents.

Materials and methods

The collection of textual records in these hospitals was carried out between October 2009 and December 2010 in three French University hospitals (Lyon, Rouen and Nice). The following medical specialties were included in the study: digestive surgery, neurosurgery, orthopedic surgery, adult intensive-care units. Reference Standard surveillance was compared with the results of automatic detection using NLP. Sensitivity on 56 HAI cases and specificity on 57 non-HAI cases were calculated.

Results

The accuracy rate was 84% (n = 95/113). The overall sensitivity of automatic detection of HAIs was 83.9% (CI 95%: 71.7–92.4) and the specificity was 84.2% (CI 95%: 72.1–92.5). The sensitivity varies from one specialty to the other, from 69.2% (CI 95%: 38.6–90.9) for intensive care to 93.3% (CI 95%: 68.1–99.8) for orthopedic surgery. The manual review of classification errors showed that the most frequent cause was an inaccurate temporal labeling of medical events, which is an important factor for HAI detection.

Conclusion

This study confirmed the feasibility of using NLP for the HAI detection in hospital facilities. Automatic HAI detection algorithms could offer better surveillance standardization for hospital comparisons.

Introduction

Monitoring of healthcare-associated infections (HAIs) is an essential part of the policy of quality of care in hospitals [1]. Indicators produced by this monitoring (including the incidence of HAIs) allow health professionals to assess whether corrective actions should be implemented and evaluate their effectiveness. This routine activity represents a significant workload in hospitals because the methods of case detection are most often based on manual methods of collection by the infection control units. The development of automated methods to detect cases of HAIs has significant potential for time-saving in this work. These methods should provide greater exhaustiveness of the reporting and greater reproducibility. Various experiments have been attempted, trying to use different sources of data from the hospital information system. Experiments most often described are for the development of detection algorithms to process structured data (bacteriological data, antibiotics prescriptions, biological data) [[2], [3], [4], [5]]. There is a growing interest in using the unstructured text of electronic medical records with the aim to improve and standardize the collection of adverse event data [[6], [7], [8], [9], [10]].

A previous research project, ALADIN-DTH, carried out between 2009 and 2012 [11], aimed to determine the feasibility of using natural language processing (NLP) for the monitoring of HAIs. The evaluation of the NLP tools developed during this project showed encouraging results in terms of sensitivity and specificity (87.6% and 97.4%, respectively) [12].

However, an error analysis showed that the concepts identified by the semantic tool in clinical notes did not make it possible to identify the clinical elements necessary for diagnosis with sufficient relevance in the medical reports to allow a routine implementation of these methods.

A second project, SYNODOS (SYstème de Normalisation et d’Organisation de Données médicales textuelles pour l’Observation en Santé), funded by the French National Research Agency was developed following this first feasibility study. One of the objectives of the SYNODOS project was to evaluate the performance of a data processing chain developed for detecting medical events in clinical documents. This assessment was carried out in incremental steps: evaluation of terminology processing of textual documents vs. manual annotation [13,14], evaluation of semantic processing of textual documents, evaluation of the whole solution on two use cases (healthcare-associated infections and colon cancer). In this article, we present the results of the evaluation of the whole processing chain of the SYNODOS project for the automatic detection of healthcare-associated infections in medical records.

Section snippets

Description of the whole SYNODOS data processing

The SYNODOS project consortium consisted of 2 university hospital teams, one specialized in hospital epidemiology (Laboratoire de Biométrie et Biologie Evolutive-LBBE, Unité Mixte de Recherche-UMR, Université Claude-Bernard Lyon 1-UCBL, Centre National de la Recherche Scientifique-CNRS 5558), the other specialized in medical terminology (Catalogue et Index des Sites Médicaux de langue Française-CISMeF) and two industrial partners, one expert in linguistic technologies (Holmes Semantic

Description of the study population

The mean age of the 120 patients corresponding to the medical records selected for the evaluation of the SYNODOS project was 56.9 years (standard deviation: 17.7) and 37.5% of them were women.

Performances of the SYNODOS solution in detecting healthcare-associated infections

The distribution of medical records is detailed in Table 1. From the 120 medical records selected for the evaluation, 7 were excluded due to absence of a discharge summary in the medical record (n = 3) or error in the gold standard classification (n = 4) (misclassification of the medical record at the

Discussion

Most studies evaluating the use of NLP for surveillance of HAIs have focused on only one type of infection at a time: catheter-associated urinary tract infection surveillance [7,10] or post-operative surgical site infections [28] or methicillin-resistant Staphylococcus aureus [29]. The aim of our study was to implement NLP algorithms at the hospital level, applying HAI detection algorithms according to the type of ward (type of surgery, intensive care). Mostly these studies were conducted in a

Conclusion

This study confirmed the feasibility of using NLP for the HAI detection in hospital facilities. However, improvements in semantic algorithms and expert rules would clearly increase the detection performance of the solution. The perspective of this type of NLP chain processing development would be to offer new alternatives in the “shift from within-hospital surveillance to between-hospital comparisons” for hospital benchmarking needs [49], thus offering a standardized case-finding process.

Author’s contributions

All authors have made substantial contributions to all of the following: (1) the conception and design of the study, acquisition of data, analysis and interpretation of data, (2) drafting the article and revising it critically (3) final approval of the version to be submitted.

Conflict of interest

The semantic analyzer was developed by Holmes Semantic Solutions, under the direction of André Bittar. The mediator was developed by Viseo Technologies, under the direction of Frédérique Segond. The other authors declare that they have no conflicts of interest in the research.

Acknowledgments

This work was funded by the Agence Nationale de la Recherche, as part of a TECSAN program (SYNODOS Project ANR-12-TECS-0006).

References (49)

  • A. Chalfine et al.

    Highly sensitive and efficient computer-assisted system for routine surveillance for surgical site infection

    Infect. Control Hosp. Epidemiol.

    (2006)
  • P. Gastmeier et al.

    How many nosocomial infections are missed if identification is restricted to patients with either microbiology reports or antibiotic administration?

    Inf. Control Hosp. Epidemiol.

    (1999)
  • A. Fong et al.

    Assessment of automating safety surveillance from electronic health records: analysis for the quality and safety review system

    J. Patient Saf.

    (2017)
  • W. Branch-Elliman et al.

    Natural language processing for Real-time catheter-associated urinary tract infection surveillance: results of a pilot implementation trial

    Infect. Control Hosp. Epidemiol.

    (2015)
  • D. Proux et al.

    Natural language processing to detect risk patterns related to Hospital acquired infections

  • C. Hagège et al.

    Linguistic and temporal processing for discovering Hospital acquired infection from patient records

  • A.V. Gundlapalli et al.

    Detecting the presence of an indwelling urinary catheter and urinary symptoms in hospitalized patients using natural language processing

    J. Biomed. Inf.

    (2016)
  • M.H. Metzger et al.

    Development of an automated detection tool for healthcare-associated infections based on screening natural language medical reports

    AMIA Annu Symp Proc IOS Press

    (2009)
  • N. Tvardik et al.

    The terminology needs for evaluation of care pathways through electronic medical records

  • S. Sakji et al.

    Evaluation of a French medical multi-terminology indexer for the manual annotation of natural language medical reports of healthcare-associated infections

  • J. Grosjean et al.

    Teaching medicine with a terminology/ontology portal

    Stud. Health Technol. Inform.

    (2012)
  • O. Bodenreider

    The unified medical language system (UMLS): integrating biomedical terminology

    Nucleic Acids Res.

    (2004)
  • A. Bittar et al.

    The dangerous myth of the star system

    Proceedings of the Ninth International Language Resources and Evaluation Conference (LREC)

    (2014)
  • J. Ni et al.

    Fast model adaptation for automated section classification in electronic medical records

    Stud. Health Technol. Inform.

    (2015)
  • Cited by (33)

    • The detection of hospitalized patients at risk of testing positive to multi-drug resistant bacteria using MOCA-I, a rule-based “white-box” classification algorithm for medical data

      2020, International Journal of Medical Informatics
      Citation Excerpt :

      With MOCA-I, it would not be necessary to target only a few interesting medical concepts to be extracted from medical reports as proposed in similar work [4] since all the concepts could be screened. An extraction tool could be applied to reports written in French: the work of Tvardik et al. gives interesting results for the detection of hospital acquired infections in medical texts [22]. Another perspective would be to use an algorithm such as MOSC (Multi-Objective Sequence Classifier) [23].

    • Automated screening of natural language in electronic health records for the diagnosis septic shock is feasible and outperforms an approach based on explicit administrative codes

      2020, Journal of Critical Care
      Citation Excerpt :

      We used a natural language application to search for patients with septic shock. This method of patient identification has already been examined for detection of treatment complications [21–23], urinary tract infections [24,25], Methicillin resistant Staphylococcus aureus [26], postoperative infections and sepsis [27,28], healthcare acquired infections [29], pneumonia [30] and acute respiratory distress syndrome [31]. In a study examining an NLP-based approach with a phrase-matching algorithm to detect complications after central venous cannulation, reported reasons for misclassifying of patients were spelling mistakes, use of abbreviations and linguistic difficulties, e.g. when the risk of a complication is written down in the patient file, rather than the occurrence of the complication itself [23].

    View all citing articles on Scopus
    View full text