ABSTRACT
Texts from the medical domain are an important task for natural language processing. This paper investigates the usefulness of a large medical database (the Unified Medical Language System) for the translation of dialogues between doctors and patients using a statistical machine translation system. We are able to show that the extraction of a large dictionary and the usage of semantic type information to generalize the training data significantly improves the translation performance.
- Allen C. Browne, Guy Divita, Alan R. Aronson, Alexa T. McGray, 2003. UMLS Language and Vocabulary Tools, Proceedings of the American Medical Informatics Association (AMIA) 2003 Symposium, Washington, DC, USA.Google Scholar
- George Doddington. 2001. Automatic Evaluation of Machine Translation Quality using n-Gram Cooccurrence Statistics. NIST Washington, DC, USA.Google Scholar
- Glenn Flores, M. Barton Laws, Sandra I Mayo, Barry Zuckerman, Milagros Abreu, Leonardo Medina, Eric J. Hardt, 2003. Errors in medical interpretation and their potential clinical consequences in pediatric encounters, Pediatrics, Jan 2003.Google Scholar
- Carol Friedman, Hongfang Liu, Lyuda Shagina, Stephen Johnson, George Hripcsak, 2001. Evaluating the UMLS as a Source of Lexical Knowledge for Medical Language Processing, Proceedings of the AMIA 2001 Symposium, Washington, DC, USA.Google Scholar
- Vipul Kashyap, 2003. The UMLS semantic network and the semantic web, Proceedings of the AMIA 2003 Symposium, Washington, DC, USA.Google Scholar
- C. Lindberg, 1990. The Unified Medical Language System (UMLS) of the National Library of Medicine, Journal of the American Medical Record Association, 1990;61(5):40--42.Google Scholar
- Lauren Neergard, 2003. Hospitals struggle with growing language barrier, Associated Press, The Charlotte Observer Sept. 2, 2003Google Scholar
- Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu, 2002. BLEU: a Method for Automatic Evaluation of Machine Translation, Proceedings of the ACL 2002, Philadelphia, USA. Google ScholarDigital Library
- SRI Speech Technology and Research Laboratory, SRI Language Modeling Toolkit, 1995--2004 (ongoing) http://www.speech.sri.com/projects/srilm/Google Scholar
- UMLS Unified Medical Language System, National Library of Medicine, 1986--2004 (ongoing) http://www.nlm.nih.gov/research/umls/Google Scholar
- Stephan Vogel and Hermann Ney, 2000. Translation with Cascaded Finite State Transducers. Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL 2000), pp. 23--30. Hongkong, China, October 2000. Google ScholarDigital Library
- Stephan Vogel, Hermann Ney, and Christoph Tillmann, 1996. HMM-based Word Alignment in Statistical Translation, Proceedings of COLING 1996: The 16th International Conference on Computational Linguistics, pp. 836--841. Copenhagen, August 1996. Google ScholarDigital Library
- Stephan Vogel, Ying Zhang, Fei Huang, Alicia Tribble, Ashish Venogupal, Bing Zhao, Alex Waibel, 2003. The CMU Statistical Translation System, Proceedings of MT-Summit IX. New Orleans, LA. Sep 2003.Google Scholar
- Ying Zhang, Stephan Vogel, Alex Waibel, 2003. Integrated Phrase Segmentation and Alignment Algorithm for Statistical Machine Translation, Proceedings of International Conference on Natural Language Processing and Knowledge Engineering 2003, Beijing, China, Oct 2003.Google ScholarCross Ref
- Pierre Zweigenbaum, Robert Baud, Anita Burgun, Fiammetta Namer, Éric Jarrousse, Natalia Grabar, Patrick Ruch, Franck Le Duff, Benoît Thirion, Stéfan Darmoni, 2003. UMLF: a Unified Medical Lexicon for French, Proceedings of the AMIA 2003 Symposium, Washington, DC, USA.Google Scholar
- Improving statistical machine translation in the medical domain using the unified medical language system
Recommendations
Preordering using a Target-Language Parser via Cross-Language Syntactic Projection for Statistical Machine Translation
When translating between languages with widely different word orders, word reordering can present a major challenge. Although some word reordering methods do not employ source-language syntactic structures, such structures are inherently useful for word ...
Syntactic discriminative language model rerankers for statistical machine translation
This article describes a method that successfully exploits syntactic features for n-best translation candidate reranking using perceptrons. We motivate the utility of syntax by demonstrating the superior performance of parsers over n-gram language ...
Improving English-Spanish statistical machine translation: experiments in domain adaptation, sentence paraphrasing, tokenization, and recasing
StatMT '08: Proceedings of the Third Workshop on Statistical Machine TranslationWe describe the experiments of the UC Berkeley team on improving English-Spanish machine translation of news text, as part of the WMT'08 Shared Translation Task. We experiment with domain adaptation, combining a small in-domain news bi-text and a large ...
Comments