Adaptation of machine translation for multilingual information retrieval in the medical domain

https://doi.org/10.1016/j.artmed.2014.01.004Get rights and content

Abstract

Objective

We investigate machine translation (MT) of user search queries in the context of cross-lingual information retrieval (IR) in the medical domain. The main focus is on techniques to adapt MT to increase translation quality; however, we also explore MT adaptation to improve effectiveness of cross-lingual IR.

Methods and data

Our MT system is Moses, a state-of-the-art phrase-based statistical machine translation system. The IR system is based on the BM25 retrieval model implemented in the Lucene search engine. The MT techniques employed in this work include in-domain training and tuning, intelligent training data selection, optimization of phrase table configuration, compound splitting, and exploiting synonyms as translation variants. The IR methods include morphological normalization and using multiple translation variants for query expansion. The experiments are performed and thoroughly evaluated on three language pairs: Czech–English, German–English, and French–English. MT quality is evaluated on data sets created within the Khresmoi project and IR effectiveness is tested on the CLEF eHealth 2013 data sets.

Results

The search query translation results achieved in our experiments are outstanding – our systems outperform not only our strong baselines, but also Google Translate and Microsoft Bing Translator in direct comparison carried out on all the language pairs. The baseline BLEU scores increased from 26.59 to 41.45 for Czech–English, from 23.03 to 40.82 for German–English, and from 32.67 to 40.82 for French–English. This is a 55% improvement on average. In terms of the IR performance on this particular test collection, a significant improvement over the baseline is achieved only for French–English. For Czech–English and German–English, the increased MT quality does not lead to better IR results.

Conclusions

Most of the MT techniques employed in our experiments improve MT of medical search queries. Especially the intelligent training data selection proves to be very successful for domain adaptation of MT. Certain improvements are also obtained from German compound splitting on the source language side. Translation quality, however, does not appear to correlate with the IR performance – better translation does not necessarily yield better retrieval. We discuss in detail the contribution of the individual techniques and state-of-the-art features and provide future research directions.

Introduction

The development of health information search and retrieval techniques is an important research topic. Indeed, it has been found that almost 70% of search engine users in the US have conducted a web search for information about a specific disease or health problem [1]. Given that much medical content is written in the English language, research to date in the medical space has predominantly focused on monolingual English retrieval. However, given the large number of non-English speaking users of the Internet and the lack of content in their native language, support for them to search and utilize these English sources is required if the value of the information available on the Internet is to be fully realized [2]. In a recent study, Lopes and Ribeiro [3] assessed the effect of translating health queries for users with different levels of English language proficiency. Their results confirmed that users with even basic competence of English can benefit from a system which automatically retrieves English content based on a non-English query, or at least suggests English translations of the non-English queries.

Support for search of English language content by non-native English speakers is one of the major goals of the large integrated EU-funded Khresmoi project.1 Among other goals, including joint text and image retrieval of radiodiagnostic records, the Khresmoi project aims to develop technology for transparent cross-lingual search of medical sources, for both professionals and laypeople, with the emphasis primarily on publicly available web sources. While a sophisticated search interface is being developed for the needs of medical professionals, the final application for the general public should be as simple as possible to operate and similar to the well-known interfaces of web search engines in use today with the addition of cross-lingual functionality.

The languages supported by the Khresmoi project are English (EN), Czech (CS), German (DE), and French (FR). Queries come from Czech, German, and French and are machine-translated to English. This reflects the real availability of data, which is predominantly available in English, and query translation needs of non-native speakers of English. Our focus in this paper is on the machine translation (MT) part of the cross-lingual search and retrieval task, while using a standard information retrieval (IR) technique for the search and retrieval part, in order to pinpoint contributions and problems with using MT for query translation from the three languages selected (Czech, German, and French) into English and its influence on the resulting quality of retrieved sets of documents.

Our MT system is based on Moses [4], a state-of-the-art statistical MT system. The IR experiments are performed using the Lucene search engine2 on the CLEF eHealth 2013 dataset for the languages specified above, directed towards retrieving English documents only. Since MT is only an intermediate component of the whole system pipeline, we proceed in two steps. We first independently tune MT to produce the best possible translations of queries (Section 2) and then use various techniques to modify and expand the translated queries for improved IR performance (Section 3). The methods applied in Section 2 include: in-domain training and tuning, intelligent training data selection, optimization of phrase table configuration, exploiting synonyms to construct translation variants, and decompounding (splitting) of complex German words on the source language side, which normally appear as unknown words. For evaluation of translation quality itself, we use BLEU – the de facto standard automatic evaluation metric [5], which compares MT output against manual reference translation and accounts both for adequacy and fluency (word order) of the machine translation. We also report inverse position-independent word error rate [6], called PER, another automatic evaluation metric which compares words in the MT output and the reference translation but without taking the word order into account and thus might be better suited to application of MT in IR, where word order is often ignored. In selected experiments, the automatic evaluation is supplemented by manual assessment of the results performed by medical professionals.

The results of our MT for experiments for queries show that we are able to outperform results of Google Translate, the best freely available MT service on the web. We also find that using synonyms to enrich training data with translation variants does not improve the MT performance; however, decompounding of complex German words slightly improves the translation, at least according to BLEU. In Section 3, we evaluate query translation in a cross-lingual IR setting using standard methods on the CLEF eHealth 2013 Task 3 test collection. Here, despite achieving superior performance on the query MT task, as described in Section 2, we do not outperform the retrieval results obtained by using queries translated by Google Translate. In the last section, we perform a summary analysis of the overall results, the results of the individual techniques for improving MT performance and their integration into an IR system, and give suggestions for further work.

Section snippets

Machine translation for medical queries

In this section, we describe the application of phrase-based statistical machine translation (SMT) to the translation of medical queries with the goal of producing accurate and fluent translations. This task differs from typical MT applications in two aspects: the domain and the genre of the input text. The domain, which reflects what the text is about, is very specific, characterized by a large and specialized vocabulary which does not occur in general texts. The genre, which indicates the

Optimizing query translation for cross-lingual information retrieval

In a standard MT scenario, the MT system is optimized to produce an output aimed to be read by a human. However, if used in a cross-lingual IR (CLIR) system, a consumer of the MT output is a computer system performing IR. Such systems usually do not require the input to be linguistically fluent or grammatically correct. The ordering of words can be loose and function words and the accuracy of other words deemed to be IR-irrelevant (traditionally called stopwords) does not matter. On the other

Conclusions

In this work, we explored cross-lingual IR in the domain of medicine and focused on machine translation as a key component introducing the possibility to search in a multilingual environment. We translate queries in Czech, German, and French to English and perform search on a collection of English documents from CLEF eHealth 2013 Task 3. Such a task is especially challenging when applied to a specific domain, such as medicine, because traditional MT systems are not generally tuned to translate

Acknowledgments

This work was supported by the EU FP7 project Khresmoi (contract no. 257528), the Czech Science Foundation (grant no. P103/12/G084), the Science Foundation Ireland (grant no. 07/CE/I1142) as part of the Centre for Next Generation Localisation at Dublin City University, and by the ESF project ELIAS.

The work described herein uses language resources hosted by the LINDAT/CLARIN repository,20 funded by the project LM2010013 of the MEYS of the Czech Republic.

References (135)

  • C. Bizer et al.

    DBpedia – a crystallization point for the web of data

    Web Semantics: Science, Services and Agents on the World Wide Web

    (2009)
  • S. Fox

    Health Topics: 80% of internet users look for health information online, Technical Report

    (2011)
  • R.J.W. Cline et al.

    Consumer health information seeking on the internet: the state of the art

    Health Education Research

    (2001)
  • C.T. Lopes et al.

    Measuring the value of health query translation: an analysis by user language proficiency

    Journal of the American Society for Information Science and Technology

    (2013)
  • P. Koehn et al.

    Moses: open source toolkit for statistical machine translation

  • K. Papineni et al.

    BLEU: a method for automatic evaluation of machine translation

  • C. Tillmann et al.

    Accelerated DP based search for statistical translation

  • F. Jelinek

    Statistical methods for speech recognition

    (1997)
  • F.J. Och et al.

    A systematic comparison of various statistical alignment models

    Computational Linguistics

    (2003)
  • F.J. Och

    Minimum error rate training in statistical machine translation

  • N. Bertoldi et al.

    Improved minimum error rate training in Moses

    Prague Bulletin of Mathematical Linguistics

    (2009)
  • P. Koehn

    Europarl: a parallel corpus for statistical machine translation

  • S. Roukos et al.

    Hansard corpus of parallel English and French

    (1995)
  • R. Steinberger et al.

    The JRC-Acquis: a multilingual aligned parallel corpus with 20+ languages

  • C. Callison-Burch et al.

    Findings of the 2012 workshop on statistical machine translation

  • P. Pecina et al.

    Domain adaptation of statistical machine translation using web-crawled resources: A case study

  • P. Langlais

    Improving a general-purpose statistical translation engine by terminological lexicons

  • G. Sanchis-Trilles et al.

    Log-linear weight optimisation via bayesian adaptation in statistical machine translation

  • A. Bisazza et al.

    Fill-up versus interpolation methods for phrase-based SMT adaptation

  • P. Nakov

    Improving English–Spanish statistical machine translation: experiments in domain adaptation, sentence paraphrasing tokenization and recasing

  • P. Koehn et al.

    Experiments in domain adaptation for statistical machine translation

  • H. Wu et al.

    Improving domain-specific word alignment with a general bilingual corpus

  • M. Carpuat et al.

    Domain adaptation in machine translation: Final report

  • M. Eck et al.

    Language model adaptation for statistical machine translation based on information retrieval

  • R.C. Moore et al.

    Intelligent selection of language model training data

  • A.S. Hildebrand et al.

    Adaptation of the translation model for statistical machine translation based on information retrieval

  • A. Axelrod et al.

    Domain adaptation via pseudo in-domain data selection

  • S. Mansour et al.

    Combining translation and language model scoring for domain-specific data filtering

  • W. Byrne et al.

    Automatic recognition of spontaneous speech for access to multilingual oral history archives

    IEEE Transactions on Speech and Audio Processing

    (2004)
  • D.S. Munteanu et al.

    Improving machine translation performance by exploiting non-parallel corpora

    Computational Linguistics

    (2005)
  • H. Daumé et al.

    Domain adaptation for machine translation by mining unseen words

  • N. Bertoldi et al.

    Domain adaptation for statistical machine translation with monolingual resources

  • P. Pecina et al.

    Towards using web-crawled data for domain adaptation in statistical machine translation

  • A. Ceausu et al.

    Experiments on domain adaptation for patent machine translation in the PLuTO project

  • C. Callison-Burch et al.

    Findings of the 2011 Workshop on Statistical Machine Translation

  • P. Banerjee et al.

    Domain adaptation in SMT of user-generated forum content guided by OOV word reduction: Normalization and/or supplementary data?

  • A. Bisazza et al.

    Cutting the long tail: hybrid language models for translation style adaptation

  • M. Fishel et al.

    From subtitles to parallel corpora

  • V. Nikoulina et al.

    Adaptation of statistical machine translation model for cross-lingual information retrieval in a service context

  • M. Eck et al.

    Improving statistical machine translation in the medical domain using the Unified Medical Language System

  • U.S. National Library of Medicine

    UMLS reference manual

    (2009)
  • C. Wu et al.

    Statistical machine translation for biomedical text: are we there yet?

    AMIA Annual Symposium Proceedings

    (2011)
  • M.R. Costa-jussà et al.

    Machine translation in medicine. A quality analysis of statistical machine translation in the medical domain

  • A. Jimeno Yepes et al.

    Combining MEDLINE and publisher data to create parallel corpora for the automatic translation of biomedical text

    BMC Bioinformatics

    (2013)
  • A. Chen

    Cross-language retrieval experiments at CLEF 2002

  • P. Koehn et al.

    Empirical methods for compound splitting

  • M. Popović et al.

    Statistical machine translation of German compound words

  • S. Niessen et al.

    Improving SMT quality with morpho-syntactic analysis

  • E. Alfonseca et al.

    Decompounding query keywords from compounding languages

  • H. Wu et al.

    Optimizing synonym extraction using monolingual and bilingual resources

  • Cited by (35)

    • The role of Roman Urdu in multilingual information retrieval: A regional study

      2020, Journal of Academic Librarianship
      Citation Excerpt :

      The study discovers that Roman Urdu fulfills the internet user's needs, but it may not provide them with all the information they need. Previous studies have also shown that users may not be satisfied from their research in their native language and they want the result in English as most of the documents available are in English (Pecina et al., 2014). Users perceive that information in English language is more authentic than in the other languages as English language is the international language and is used by most of the websites.

    • A cloud-based framework for large-scale traditional Chinese medical record retrieval

      2018, Journal of Biomedical Informatics
      Citation Excerpt :

      However, these medical resources are not suitable to deal with TCMRs. Although machine translation technology has been introduced into medical domain [34], its accuracy can’t meet clinical application due to the complexity of TCMRs. Some works integrate or improve the current information retrieval models to build medical record retrieval system for clinical diagnosis and research.

    • Telemedicine as a special case of machine translation

      2015, Computerized Medical Imaging and Graphics
      Citation Excerpt :

      They observed very low translation quality for languages with small training corpora. Pecina et al. [41] within the Khresmoi project investigated MT of user search queries in the context of cross-lingual information retrieval (IR) in the eHealth domain. Authors performed experiments and thoroughly evaluated on three language pairs: Czech–English, German–English, and French–English.

    • Textual Representations for Crosslingual Information Retrieval

      2021, ECNLP 2021 - 4th Workshop on e-Commerce and NLP, Proceedings
    View all citing articles on Scopus
    View full text