Skip to main content

Natural Language Processing in Biomedicine: A Unified System Architecture Overview

  • Protocol
  • First Online:
Clinical Bioinformatics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1168))

Abstract

In contemporary electronic medical records much of the clinically important data—signs and symptoms, symptom severity, disease status, etc.—are not provided in structured data fields but rather are encoded in clinician-generated narrative text. Natural language processing (NLP) provides a means of unlocking this important data source for applications in clinical decision support, quality assurance, and public health. This chapter provides an overview of representative NLP systems in biomedicine based on a unified architectural view. A general architecture in an NLP system consists of two main components: background knowledge that includes biomedical knowledge resources and a framework that integrates NLP tools to process text. Systems differ in both components, which we review briefly. Additionally, the challenge facing current research efforts in biomedical NLP includes the paucity of large, publicly available annotated corpora, although initiatives that facilitate data sharing, system evaluation, and collaborative work between researchers in clinical NLP are starting to emerge.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

BNF:

Backus–Naur form

cTAKES:

Clinical Text Analysis and Knowledge Extraction System

EMR:

Electronic medical record

GATE:

General Architecture for Text Engineering

LSP:

Linguistic String Project

MedLee:

Medical Language Extraction and Encoding System

MLP:

Medical language processor

NER:

Named entity recognition

NLP:

Natural language processing

POS:

Part of speech

UIMA:

Unstructured Information Management Architecture

UMLS:

Unified Medical Language System

References

  1. Sager N, Friedman C, Lyman M (1987) Medical language processing: computer management of narrative data. Addison-Wesley, Reading, MA

    Google Scholar 

  2. Lindberg DA, Humphreys BL, McCray AT (1993) The Unified Medical Language System. Methods Inf Med 32:281–291

    CAS  PubMed  Google Scholar 

  3. Spyns P (1996) Natural language processing in medicine: an overview. Methods Inf Med 35:285–301

    CAS  PubMed  Google Scholar 

  4. Demner-Fushman D, Chapman WW, McDonald CJ (2009) What can natural language processing do for clinical decision support? J Biomed Inform 42:760–772

    Article  PubMed Central  PubMed  Google Scholar 

  5. Friedman C (2005) Semantic text parsing for patient records. In: Chun H, Fuller S, Friedman C et al (eds) Knowledge management and data mining in biomedicine. Springer, New York, pp 423–448

    Google Scholar 

  6. Nadkarni PM, Ohno-Machado L, Chapman WW (2011) Natural language processing: an introduction. J Am Med Inform Assoc 18:544–551

    Article  PubMed Central  PubMed  Google Scholar 

  7. Friedman C, Elhadad N (2014) Natural language processing in health care and biomedicine. In: Shortliffe EH, Cimino J (eds) Biomedical informatics; computer applications in health care and biomedicine. Springer, London, pp 255–284

    Google Scholar 

  8. Friedman C, Rindflesch TC, Corn M (2013) Natural language processing: state of the art and prospects for significant progress, a workshop sponsored by the National Library of Medicine. J Biomed Inform 46:765–773

    Article  PubMed  Google Scholar 

  9. McCray AT, Srinivasan S, Browne AC (1994) Lexical methods for managing variation in biomedical terminologies. Proc Annu Symp Comput Appl Med Care 1994:235–239

    Google Scholar 

  10. Xu H, Stenner SP, Doan S et al (2010) MedEx: a medication information extraction system for clinical narratives. J Am Med Inform Assoc 17:19–24

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  11. Doan S, Bastarache L, Klimkowski S et al (2010) Integrating existing natural language processing tools for medication extraction from discharge summaries. J Am Med Inform Assoc 17:528–531

    Article  PubMed Central  PubMed  Google Scholar 

  12. Sager N, Lyman M, Bucknall C et al (1994) Natural language processing and the representation of clinical data. J Am Med Inform Assoc 1:142–160

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  13. Harris Z (1968) Mathematical structures of language. Wiley, New York

    Google Scholar 

  14. Harris Z (1982) A Grammar of English on mathematical principles. Wiley, Australia

    Google Scholar 

  15. Harris Z (1991) A theory of language and information: a mathematical approach. Clarendon, Oxford

    Google Scholar 

  16. Hirschman L, Puder K (1985) Restriction grammar: a Prolog implementation. In: Warren D, van Canegham M (eds) Logic programming and its applications. Ablex Publishing Corporation, Norwood, NJ, pp 244–261

    Google Scholar 

  17. Sager N, Lyman M, Nhàn NT et al (1994) Automatic encoding into SNOMED III: a preliminary investigation. Proc Annu Symp Comput Appl Med Care 1994:230–234

    Google Scholar 

  18. Sager N, Lyman M, Nhàn NT et al (1995) Medical language processing: applications to patient data representation and automatic encoding. Methods Inf Med 34:140–146

    CAS  PubMed  Google Scholar 

  19. Friedman C, Alderson PO, Austin JH et al (1994) A general natural-language processor for clinical radiology. J Am Med Inform Assoc 1:161–174

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  20. Friedman C, Cimino JJ, Johnson SB (1994) A schema for representing medical language applied to clinical radiology. J Am Med Inform Assoc 1:233–248

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  21. Knirsch CA, Jain NL, Pablos-Mendez A et al (1998) Respiratory isolation of tuberculosis patients using clinical guidelines and an automated clinical decision support system. Infect Control Hosp Epidemiol 19:94–100

    Article  CAS  PubMed  Google Scholar 

  22. Friedman C, Hripcsak G (1999) Natural language processing and its future in medicine. Acad Med 74:890–895

    Article  CAS  PubMed  Google Scholar 

  23. Friedman C, Shagina L, Lussier Y et al (2004) Automated encoding of clinical documents based on natural language processing. J Am Med Inform Assoc 11:392–402

    Article  PubMed Central  PubMed  Google Scholar 

  24. Friedman C, Kra P, Yu H et al (2001) GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics 17:S74–S82

    Article  PubMed  Google Scholar 

  25. Haug P, Koehler S, Lau LM et al (1994) A natural language understanding system combining syntactic and semantic techniques. Proc Annu Symp Comput Appl Med Care 1994:247–251

    Google Scholar 

  26. Haug PJ, Koehler S, Lau LM et al (1995) Experience with a mixed semantic/syntactic parser. Proc Annu Symp Comput Appl Med Care 1995:284–288

    Google Scholar 

  27. Koehler S (1998) SymText: a natural language understanding system for encoding free text medical data. Doctor Dissertation, University of Utah. ISBN:0-591-82476-0

    Google Scholar 

  28. Christensen LM, Haug PJ, Fiszman M (2002) MPLUS: a probabilistic medical language understanding system. In: Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain, vol 3, pp 29–36

    Google Scholar 

  29. Haug PJ, Christensen L, Gundersen M et al (1997) A natural language parsing system for encoding admitting diagnoses. Proc AMIA Annu Fall Symp 1997:814–818

    Google Scholar 

  30. Fiszman M, Chapman WW, Evans SR et al (1999) Automatic identification of pneumonia related concepts on chest x-ray reports. Proc AMIA Symp 1999:67–71

    Google Scholar 

  31. Fiszman M, Chapman WW, Aronsky D et al (2000) Automatic detection of acute bacterial pneumonia from chest X-ray reports. J Am Med Inform Assoc 7:593–604

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  32. Aronson AR (2001) Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc AMIA Symp 2001:17–21

    Google Scholar 

  33. Aronson AR, Lang F-M (2010) An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc 17:229–236

    PubMed Central  PubMed  Google Scholar 

  34. Shah PK, Perez-Iratxeta C, Bork P et al (2003) Information extraction from full-text scientific articles: where are the keywords? BMC Bioinformatics 4:20

    Article  PubMed Central  PubMed  Google Scholar 

  35. Meystre SM, Thibault J, Shen S et al (2010) Textractor: a hybrid system for medications and reason for their prescription extraction from clinical text documents. J Am Med Inform Assoc 17:559–562

    Article  PubMed Central  PubMed  Google Scholar 

  36. Pakhomov S, Shah N, Hanson P et al (2008) Automatic quality of life prediction using electronic medical records. AMIA Annu Symp Proc 2008:545–549

    PubMed Central  Google Scholar 

  37. Doan S, Lin K-W, Conway M et al (2014) PhenDisco: phenotype diversity system for the database of genotypes and phenotypes. J Am Med Inform Assoc 21:31–36

    Article  PubMed Central  PubMed  Google Scholar 

  38. Chapman WW, Bridewell W, Hanbury P et al (2001) A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform 34:301–310

    Article  CAS  PubMed  Google Scholar 

  39. Mork JG, Bodenreider O, Demner-Fushman D et al (2010) Extracting Rx information from clinical narrative. J Am Med Inform Assoc 17:536–539

    Article  PubMed Central  PubMed  Google Scholar 

  40. Uzuner O, Solti I, Cadag E (2010) Extracting medication information from clinical text. J Am Med Inform Assoc 17:514–518

    Article  PubMed Central  PubMed  Google Scholar 

  41. Zeng QT, Goryachev S, Weiss S et al (2006) Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system. BMC Med Inform Decis Mak 6:30

    Article  PubMed Central  PubMed  Google Scholar 

  42. Goryachev S, Sordo M, Zeng QT (2006) A suite of natural language processing tools developed for the I2B2 project. AMIA Annu Symp Proc 2006:931

    PubMed Central  Google Scholar 

  43. Savova GK, Masanz JJ, Ogren PV et al (2010) Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc 17:507–513

    Article  PubMed Central  PubMed  Google Scholar 

  44. Apache Software Foundation OpenNLP. http://opennlp.apache.org/

  45. Savova GK, Ogren PV, Duffy PH et al (2008) Mayo clinic NLP system for patient smoking status identification. J Am Med Inform Assoc 15:25–28

    Article  PubMed Central  PubMed  Google Scholar 

  46. Sohn S, Savova GK (2009) Mayo clinic smoking status classification system: extensions and improvements. AMIA Annu Symp Proc 2009:619–623

    PubMed Central  PubMed  Google Scholar 

  47. de Bruijn B, Cherry C, Kiritchenko S et al (2011) Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010. J Am Med Inform Assoc 18:557–562

    Article  PubMed Central  PubMed  Google Scholar 

  48. Albright D, Lanfranchi A, Fredriksen A et al (2012) Towards comprehensive syntactic and semantic annotations of the clinical narrative. J Am Med Inform Assoc 20:922–930

    Article  Google Scholar 

  49. Chapman WW, Nadkarni PM, Hirschman L et al (2011) Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. J Am Med Inform Assoc 18:540–543

    Article  PubMed Central  PubMed  Google Scholar 

  50. Ohno-Machado L, Bafna V, Boxwala AA et al (2012) iDASH: integrating data for analysis, anonymization, and sharing. J Am Med Inform Assoc 19:196–201

    Article  PubMed Central  PubMed  Google Scholar 

  51. Denny JC (2012) Chapter 13: mining electronic health records in the genomics era. PLoS Comput Biol 8:e1002823

    Article  CAS  PubMed Central  PubMed  Google Scholar 

Download references

Acknowledgements

S.D. and L.O.M. were funded in part by NIH grants U54HL108460 and UH3HL108785.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Son Doan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this protocol

Cite this protocol

Doan, S., Conway, M., Phuong, T.M., Ohno-Machado, L. (2014). Natural Language Processing in Biomedicine: A Unified System Architecture Overview. In: Trent, R. (eds) Clinical Bioinformatics. Methods in Molecular Biology, vol 1168. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-0847-9_16

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-0847-9_16

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-0846-2

  • Online ISBN: 978-1-4939-0847-9

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics