Article

Free Access

Improving statistical machine translation in the medical domain using the unified medical language system

Authors:
Matthias Eck

Carnegie Mellon University, Pittsburgh, PA

Carnegie Mellon University, Pittsburgh, PA
View Profile

,
Stephan Vogel

Carnegie Mellon University, Pittsburgh, PA

Carnegie Mellon University, Pittsburgh, PA
View Profile

,
Alex Waibel

Carnegie Mellon University, Pittsburgh, PA

Carnegie Mellon University, Pittsburgh, PA
View Profile

COLING '04: Proceedings of the 20th international conference on Computational LinguisticsAugust 2004Pages 792–eshttps://doi.org/10.3115/1220355.1220469

Published:23 August 2004Publication History

COLING '04: Proceedings of the 20th international conference on Computational Linguistics

Pages 792–es

ABSTRACT

Texts from the medical domain are an important task for natural language processing. This paper investigates the usefulness of a large medical database (the Unified Medical Language System) for the translation of dialogues between doctors and patients using a statistical machine translation system. We are able to show that the extraction of a large dictionary and the usage of semantic type information to generalize the training data significantly improves the translation performance.

References

Allen C. Browne, Guy Divita, Alan R. Aronson, Alexa T. McGray, 2003. UMLS Language and Vocabulary Tools, Proceedings of the American Medical Informatics Association (AMIA) 2003 Symposium, Washington, DC, USA.Google Scholar
George Doddington. 2001. Automatic Evaluation of Machine Translation Quality using n-Gram Cooccurrence Statistics. NIST Washington, DC, USA.Google Scholar
Glenn Flores, M. Barton Laws, Sandra I Mayo, Barry Zuckerman, Milagros Abreu, Leonardo Medina, Eric J. Hardt, 2003. Errors in medical interpretation and their potential clinical consequences in pediatric encounters, Pediatrics, Jan 2003.Google Scholar
Carol Friedman, Hongfang Liu, Lyuda Shagina, Stephen Johnson, George Hripcsak, 2001. Evaluating the UMLS as a Source of Lexical Knowledge for Medical Language Processing, Proceedings of the AMIA 2001 Symposium, Washington, DC, USA.Google Scholar
Vipul Kashyap, 2003. The UMLS semantic network and the semantic web, Proceedings of the AMIA 2003 Symposium, Washington, DC, USA.Google Scholar
C. Lindberg, 1990. The Unified Medical Language System (UMLS) of the National Library of Medicine, Journal of the American Medical Record Association, 1990;61(5):40--42.Google Scholar
Lauren Neergard, 2003. Hospitals struggle with growing language barrier, Associated Press, The Charlotte Observer Sept. 2, 2003Google Scholar
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu, 2002. BLEU: a Method for Automatic Evaluation of Machine Translation, Proceedings of the ACL 2002, Philadelphia, USA. Google ScholarDigital Library
SRI Speech Technology and Research Laboratory, SRI Language Modeling Toolkit, 1995--2004 (ongoing) http://www.speech.sri.com/projects/srilm/Google Scholar
UMLS Unified Medical Language System, National Library of Medicine, 1986--2004 (ongoing) http://www.nlm.nih.gov/research/umls/Google Scholar
Stephan Vogel and Hermann Ney, 2000. Translation with Cascaded Finite State Transducers. Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL 2000), pp. 23--30. Hongkong, China, October 2000. Google ScholarDigital Library
Stephan Vogel, Hermann Ney, and Christoph Tillmann, 1996. HMM-based Word Alignment in Statistical Translation, Proceedings of COLING 1996: The 16th International Conference on Computational Linguistics, pp. 836--841. Copenhagen, August 1996. Google ScholarDigital Library
Stephan Vogel, Ying Zhang, Fei Huang, Alicia Tribble, Ashish Venogupal, Bing Zhao, Alex Waibel, 2003. The CMU Statistical Translation System, Proceedings of MT-Summit IX. New Orleans, LA. Sep 2003.Google Scholar
Ying Zhang, Stephan Vogel, Alex Waibel, 2003. Integrated Phrase Segmentation and Alignment Algorithm for Statistical Machine Translation, Proceedings of International Conference on Natural Language Processing and Knowledge Engineering 2003, Beijing, China, Oct 2003.Google ScholarCross Ref
Pierre Zweigenbaum, Robert Baud, Anita Burgun, Fiammetta Namer, Éric Jarrousse, Natalia Grabar, Patrick Ruch, Franck Le Duff, Benoît Thirion, Stéfan Darmoni, 2003. UMLF: a Unified Medical Lexicon for French, Proceedings of the AMIA 2003 Symposium, Washington, DC, USA.Google Scholar

Improving statistical machine translation in the medical domain using the unified medical language system

Recommendations

Preordering using a Target-Language Parser via Cross-Language Syntactic Projection for Statistical Machine Translation

When translating between languages with widely different word orders, word reordering can present a major challenge. Although some word reordering methods do not employ source-language syntactic structures, such structures are inherently useful for word ...
Read More
Syntactic discriminative language model rerankers for statistical machine translation

This article describes a method that successfully exploits syntactic features for n-best translation candidate reranking using perceptrons. We motivate the utility of syntax by demonstrating the superior performance of parsers over n-gram language ...
Read More
Improving English-Spanish statistical machine translation: experiments in domain adaptation, sentence paraphrasing, tokenization, and recasing
StatMT '08: Proceedings of the Third Workshop on Statistical Machine Translation

We describe the experiments of the UC Berkeley team on improving English-Spanish machine translation of news text, as part of the WMT'08 Shared Translation Task. We experiment with domain adaptation, combining a small in-domain news bi-text and a large ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

COLING '04: Proceedings of the 20th international conference on Computational Linguistics
August 2004
1411 pages
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 23 August 2004
Qualifiers
- Article
Conference

Acceptance Rates
COLING '04 Paper Acceptance Rate1,411of1,411submissions,100%Overall Acceptance Rate1,537of1,537submissions,100%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 364
  Total Downloads
- Downloads (Last 12 months)28
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Improving statistical machine translation in the medical domain using the unified medical language system

COLING '04: Proceedings of the 20th international conference on Computational Linguistics

ABSTRACT

References

Cited By

Recommendations

Preordering using a Target-Language Parser via Cross-Language Syntactic Projection for Statistical Machine Translation

Syntactic discriminative language model rerankers for statistical machine translation

Improving English-Spanish statistical machine translation: experiments in domain adaptation, sentence paraphrasing, tokenization, and recasing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Improving statistical machine translation in the medical domain using the unified medical language system

COLING '04: Proceedings of the 20th international conference on Computational Linguistics

ABSTRACT

References

Cited By

Recommendations

Preordering using a Target-Language Parser via Cross-Language Syntactic Projection for Statistical Machine Translation

Syntactic discriminative language model rerankers for statistical machine translation

Improving English-Spanish statistical machine translation: experiments in domain adaptation, sentence paraphrasing, tokenization, and recasing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media