ScienceDirect® Home Skip Main Navigation Links
You have guest access to ScienceDirect. Find out more.
 
Home
Browse
My Settings
Alerts
Help
 Quick Search
 Search tips (Opens new window)
    Clear all fields    
Journal of Biomedical Informatics
Volume 35, Issue 4, August 2002, Pages 247-259
Sublanguage - Zellig Harris Memorial
 
Font Size: Decrease Font Size  Increase Font Size
 Abstract - selected
Article
Purchase PDF (287 K)

 
 
 
Related Articles in ScienceDirect
View More Related Articles
 
View Record in Scopus
 
doi:10.1016/S1532-0464(03)00014-5    How to Cite or Link Using DOI (Opens New Window)
Copyright © 2003 Elsevier Science (USA). All rights reserved.

Rutabaga by any other name: extracting biological names

Lynette HirschmanCorresponding Author Contact Information, E-mail The Corresponding Author, Alexander A. Morgan and Alexander S. Yeh

The MITRE Corporation, MS K312, 202 Burlington Rd., Bedford, MA 01730, USA

Received 10 September 2002. 
Available online 12 March 2003.

Purchase the full-text article



References and further reading may be available for this article. To view references and further reading you must purchase this article.

Abstract

As the pace of biological research accelerates, biologists are becoming increasingly reliant on computers to manage the information explosion. Biologists communicate their research findings by relying on precise biological terms; these terms then provide indices into the literature and across the growing number of biological databases. This article examines emerging techniques to access biological resources through extraction of entity names and relations among them. Information extraction has been an active area of research in natural language processing and there are promising results for information extraction applied to news stories, e.g., balanced precision and recall in the 93–95% range for identifying person, organization and location names. But these results do not seem to transfer directly to biological names, where results remain in the 75–80% range. Multiple factors may be involved, including absence of shared training and test sets for rigorous measures of progress, lack of annotated training data specific to biological tasks, pervasive ambiguity of terms, frequent introduction of new terms, and a mismatch between evaluation tasks as defined for news and real biological problems. We present evidence from a simple lexical matching exercise that illustrates some specific problems encountered when identifying biological names. We conclude by outlining a research agenda to raise performance of named entity tagging to a level where it can be used to perform tasks of biological importance.

Article Outline

1. Background
1.1. Why names are important
1.2. Extracting names
2. Extracting names in biology
2.1. Information extraction for news
2.2. Information extraction in biology
3. Are names in biology harder than names in news?
3.1. The experience factor
3.2. Training data
3.3. Interannotator agreement and task definition
3.4. A systematic comparison of biology and news
4. Naming biological entities
4.1. Biological name formation
4.2. A lexical-based pattern matching experiment
5. Lessons learned
References



Journal of Biomedical Informatics
Volume 35, Issue 4, August 2002, Pages 247-259
Sublanguage - Zellig Harris Memorial
 
Home
Browse
My Settings
Alerts
Help
Elsevier.com (Opens new window)
About ScienceDirect  |  Contact Us  |  Information for Advertisers  |  Terms & Conditions  |  Privacy Policy
Copyright © 2008 Elsevier B.V. All rights reserved. ScienceDirect® is a registered trademark of Elsevier B.V.