Natural Language Processing in Biomedicine: A Unified System Architecture Overview

Doan, Son; Conway, Mike; Phuong, Tu Minh; Ohno-Machado, Lucila

doi:10.1007/978-1-4939-0847-9_16

Son Doan³,
Mike Conway⁴,
Tu Minh Phuong⁵ &
…
Lucila Ohno-Machado³

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1168))

5092 Accesses
47 Citations
11 Altmetric

Abstract

In contemporary electronic medical records much of the clinically important data—signs and symptoms, symptom severity, disease status, etc.—are not provided in structured data fields but rather are encoded in clinician-generated narrative text. Natural language processing (NLP) provides a means of unlocking this important data source for applications in clinical decision support, quality assurance, and public health. This chapter provides an overview of representative NLP systems in biomedicine based on a unified architectural view. A general architecture in an NLP system consists of two main components: background knowledge that includes biomedical knowledge resources and a framework that integrates NLP tools to process text. Systems differ in both components, which we review briefly. Additionally, the challenge facing current research efforts in biomedical NLP includes the paucity of large, publicly available annotated corpora, although initiatives that facilitate data sharing, system evaluation, and collaborative work between researchers in clinical NLP are starting to emerge.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

BNF:: Backus–Naur form
cTAKES:: Clinical Text Analysis and Knowledge Extraction System
EMR:: Electronic medical record
GATE:: General Architecture for Text Engineering
LSP:: Linguistic String Project
MedLee:: Medical Language Extraction and Encoding System
MLP:: Medical language processor
NER:: Named entity recognition
NLP:: Natural language processing
POS:: Part of speech
UIMA:: Unstructured Information Management Architecture
UMLS:: Unified Medical Language System

References

Sager N, Friedman C, Lyman M (1987) Medical language processing: computer management of narrative data. Addison-Wesley, Reading, MA
Google Scholar
Lindberg DA, Humphreys BL, McCray AT (1993) The Unified Medical Language System. Methods Inf Med 32:281–291
CAS PubMed Google Scholar
Spyns P (1996) Natural language processing in medicine: an overview. Methods Inf Med 35:285–301
CAS PubMed Google Scholar
Demner-Fushman D, Chapman WW, McDonald CJ (2009) What can natural language processing do for clinical decision support? J Biomed Inform 42:760–772
Article PubMed Central PubMed Google Scholar
Friedman C (2005) Semantic text parsing for patient records. In: Chun H, Fuller S, Friedman C et al (eds) Knowledge management and data mining in biomedicine. Springer, New York, pp 423–448
Google Scholar
Nadkarni PM, Ohno-Machado L, Chapman WW (2011) Natural language processing: an introduction. J Am Med Inform Assoc 18:544–551
Article PubMed Central PubMed Google Scholar
Friedman C, Elhadad N (2014) Natural language processing in health care and biomedicine. In: Shortliffe EH, Cimino J (eds) Biomedical informatics; computer applications in health care and biomedicine. Springer, London, pp 255–284
Google Scholar
Friedman C, Rindflesch TC, Corn M (2013) Natural language processing: state of the art and prospects for significant progress, a workshop sponsored by the National Library of Medicine. J Biomed Inform 46:765–773
Article PubMed Google Scholar
McCray AT, Srinivasan S, Browne AC (1994) Lexical methods for managing variation in biomedical terminologies. Proc Annu Symp Comput Appl Med Care 1994:235–239
Google Scholar
Xu H, Stenner SP, Doan S et al (2010) MedEx: a medication information extraction system for clinical narratives. J Am Med Inform Assoc 17:19–24
Article CAS PubMed Central PubMed Google Scholar
Doan S, Bastarache L, Klimkowski S et al (2010) Integrating existing natural language processing tools for medication extraction from discharge summaries. J Am Med Inform Assoc 17:528–531
Article PubMed Central PubMed Google Scholar
Sager N, Lyman M, Bucknall C et al (1994) Natural language processing and the representation of clinical data. J Am Med Inform Assoc 1:142–160
Article CAS PubMed Central PubMed Google Scholar
Harris Z (1968) Mathematical structures of language. Wiley, New York
Google Scholar
Harris Z (1982) A Grammar of English on mathematical principles. Wiley, Australia
Google Scholar
Harris Z (1991) A theory of language and information: a mathematical approach. Clarendon, Oxford
Google Scholar
Hirschman L, Puder K (1985) Restriction grammar: a Prolog implementation. In: Warren D, van Canegham M (eds) Logic programming and its applications. Ablex Publishing Corporation, Norwood, NJ, pp 244–261
Google Scholar
Sager N, Lyman M, Nhàn NT et al (1994) Automatic encoding into SNOMED III: a preliminary investigation. Proc Annu Symp Comput Appl Med Care 1994:230–234
Google Scholar
Sager N, Lyman M, Nhàn NT et al (1995) Medical language processing: applications to patient data representation and automatic encoding. Methods Inf Med 34:140–146
CAS PubMed Google Scholar
Friedman C, Alderson PO, Austin JH et al (1994) A general natural-language processor for clinical radiology. J Am Med Inform Assoc 1:161–174
Article CAS PubMed Central PubMed Google Scholar
Friedman C, Cimino JJ, Johnson SB (1994) A schema for representing medical language applied to clinical radiology. J Am Med Inform Assoc 1:233–248
Article CAS PubMed Central PubMed Google Scholar
Knirsch CA, Jain NL, Pablos-Mendez A et al (1998) Respiratory isolation of tuberculosis patients using clinical guidelines and an automated clinical decision support system. Infect Control Hosp Epidemiol 19:94–100
Article CAS PubMed Google Scholar
Friedman C, Hripcsak G (1999) Natural language processing and its future in medicine. Acad Med 74:890–895
Article CAS PubMed Google Scholar
Friedman C, Shagina L, Lussier Y et al (2004) Automated encoding of clinical documents based on natural language processing. J Am Med Inform Assoc 11:392–402
Article PubMed Central PubMed Google Scholar
Friedman C, Kra P, Yu H et al (2001) GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics 17:S74–S82
Article PubMed Google Scholar
Haug P, Koehler S, Lau LM et al (1994) A natural language understanding system combining syntactic and semantic techniques. Proc Annu Symp Comput Appl Med Care 1994:247–251
Google Scholar
Haug PJ, Koehler S, Lau LM et al (1995) Experience with a mixed semantic/syntactic parser. Proc Annu Symp Comput Appl Med Care 1995:284–288
Google Scholar
Koehler S (1998) SymText: a natural language understanding system for encoding free text medical data. Doctor Dissertation, University of Utah. ISBN:0-591-82476-0
Google Scholar
Christensen LM, Haug PJ, Fiszman M (2002) MPLUS: a probabilistic medical language understanding system. In: Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain, vol 3, pp 29–36
Google Scholar
Haug PJ, Christensen L, Gundersen M et al (1997) A natural language parsing system for encoding admitting diagnoses. Proc AMIA Annu Fall Symp 1997:814–818
Google Scholar
Fiszman M, Chapman WW, Evans SR et al (1999) Automatic identification of pneumonia related concepts on chest x-ray reports. Proc AMIA Symp 1999:67–71
Google Scholar
Fiszman M, Chapman WW, Aronsky D et al (2000) Automatic detection of acute bacterial pneumonia from chest X-ray reports. J Am Med Inform Assoc 7:593–604
Article CAS PubMed Central PubMed Google Scholar
Aronson AR (2001) Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc AMIA Symp 2001:17–21
Google Scholar
Aronson AR, Lang F-M (2010) An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc 17:229–236
PubMed Central PubMed Google Scholar
Shah PK, Perez-Iratxeta C, Bork P et al (2003) Information extraction from full-text scientific articles: where are the keywords? BMC Bioinformatics 4:20
Article PubMed Central PubMed Google Scholar
Meystre SM, Thibault J, Shen S et al (2010) Textractor: a hybrid system for medications and reason for their prescription extraction from clinical text documents. J Am Med Inform Assoc 17:559–562
Article PubMed Central PubMed Google Scholar
Pakhomov S, Shah N, Hanson P et al (2008) Automatic quality of life prediction using electronic medical records. AMIA Annu Symp Proc 2008:545–549
PubMed Central Google Scholar
Doan S, Lin K-W, Conway M et al (2014) PhenDisco: phenotype diversity system for the database of genotypes and phenotypes. J Am Med Inform Assoc 21:31–36
Article PubMed Central PubMed Google Scholar
Chapman WW, Bridewell W, Hanbury P et al (2001) A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform 34:301–310
Article CAS PubMed Google Scholar
Mork JG, Bodenreider O, Demner-Fushman D et al (2010) Extracting Rx information from clinical narrative. J Am Med Inform Assoc 17:536–539
Article PubMed Central PubMed Google Scholar
Uzuner O, Solti I, Cadag E (2010) Extracting medication information from clinical text. J Am Med Inform Assoc 17:514–518
Article PubMed Central PubMed Google Scholar
Zeng QT, Goryachev S, Weiss S et al (2006) Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system. BMC Med Inform Decis Mak 6:30
Article PubMed Central PubMed Google Scholar
Goryachev S, Sordo M, Zeng QT (2006) A suite of natural language processing tools developed for the I2B2 project. AMIA Annu Symp Proc 2006:931
PubMed Central Google Scholar
Savova GK, Masanz JJ, Ogren PV et al (2010) Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc 17:507–513
Article PubMed Central PubMed Google Scholar
Apache Software Foundation OpenNLP. http://opennlp.apache.org/
Savova GK, Ogren PV, Duffy PH et al (2008) Mayo clinic NLP system for patient smoking status identification. J Am Med Inform Assoc 15:25–28
Article PubMed Central PubMed Google Scholar
Sohn S, Savova GK (2009) Mayo clinic smoking status classification system: extensions and improvements. AMIA Annu Symp Proc 2009:619–623
PubMed Central PubMed Google Scholar
de Bruijn B, Cherry C, Kiritchenko S et al (2011) Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010. J Am Med Inform Assoc 18:557–562
Article PubMed Central PubMed Google Scholar
Albright D, Lanfranchi A, Fredriksen A et al (2012) Towards comprehensive syntactic and semantic annotations of the clinical narrative. J Am Med Inform Assoc 20:922–930
Article Google Scholar
Chapman WW, Nadkarni PM, Hirschman L et al (2011) Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. J Am Med Inform Assoc 18:540–543
Article PubMed Central PubMed Google Scholar
Ohno-Machado L, Bafna V, Boxwala AA et al (2012) iDASH: integrating data for analysis, anonymization, and sharing. J Am Med Inform Assoc 19:196–201
Article PubMed Central PubMed Google Scholar
Denny JC (2012) Chapter 13: mining electronic health records in the genomics era. PLoS Comput Biol 8:e1002823
Article CAS PubMed Central PubMed Google Scholar

Download references

Acknowledgements

S.D. and L.O.M. were funded in part by NIH grants U54HL108460 and UH3HL108785.

Author information

Authors and Affiliations

Division of Biomedical Informatics, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, USA
Son Doan & Lucila Ohno-Machado
Division of Behavioral Medicine, University of California, San Diego, La Jolla, CA, USA
Mike Conway
Department of Computer Science, Posts and Telecommunications Institute of Technology, Hanoi, Vietnam
Tu Minh Phuong

Authors

Son Doan
View author publications
You can also search for this author in PubMed Google Scholar
Mike Conway
View author publications
You can also search for this author in PubMed Google Scholar
Tu Minh Phuong
View author publications
You can also search for this author in PubMed Google Scholar
Lucila Ohno-Machado
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Son Doan .

Editor information

Editors and Affiliations

Department of Medical Genomics, Royal Prince Alfred Hospital and Sydney Medical School, University of Sydney, Camperdown, Australia
Ronald Trent

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Doan, S., Conway, M., Phuong, T.M., Ohno-Machado, L. (2014). Natural Language Processing in Biomedicine: A Unified System Architecture Overview. In: Trent, R. (eds) Clinical Bioinformatics. Methods in Molecular Biology, vol 1168. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-0847-9_16

Download citation

DOI: https://doi.org/10.1007/978-1-4939-0847-9_16
Published: 09 May 2014
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-0846-2
Online ISBN: 978-1-4939-0847-9
eBook Packages: Springer Protocols

Publish with us

Policies and ethics