Skip to main content

Domain Adaptation of Statistical Machine Translation Models with Monolingual Data for Cross Lingual Information Retrieval

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7814))

Abstract

Statistical Machine Translation (SMT) is often used as a black-box in CLIR tasks. We propose an adaptation method for an SMT model relying on the monolingual statistics that can be extracted from the document collection (both source and target if available). We evaluate our approach on CLEF Domain Specific task (German-English and English-German) and show that very simple document collection statistics integrated in SMT translation model allow to obtain good gains both in terms of IR metrics (MAP, P10) and MT evaluation metrics (BLEU, TER).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   99.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Clinchant, S., Renders, J.-M.: Query Translation through Dictionary Adaptation. In: Peters, C., Jijkoun, V., Mandl, T., Müller, H., Oard, D.W., Peñas, A., Petras, V., Santos, D. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 182–187. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  2. Koehn, et al.: Statistical phrase based translation. In: HLT/NAACL (2003)

    Google Scholar 

  3. Klementiev, A., Irvine, A., Callison-Burch, C., Yarowsky, D.: Toward statistical machine translation without parallel corpora. In: EACL

    Google Scholar 

  4. Koehn, P.: Europarl: A multilingual corpus for evaluation of machine translation. MT Summit (2005)

    Google Scholar 

  5. Magdy, W., Jones, G.J.F.: Should MT Systems Be Used as Black Boxes in CLIR? In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 683–686. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  6. Nikoulina, V., Kovachev, B., Lagos, N., Monz, C.: Adaptation of statistical machine translation model for cross-lingual information retrieval in a service context. In: EACL (2012)

    Google Scholar 

  7. Papineni, K., Roukos, S., Ward, T., Zhu, W.: Bleu: a method for automatic evaluation of machine translation (2001)

    Google Scholar 

  8. Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: AMTA (2006)

    Google Scholar 

  9. Su, J., Wu, H., Wang, H., Chen, Y., Shi, X., Dong, H., Liu, Q.: Translation model adaptation for statistical machine translation with monolingual topic information. In: ACL (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nikoulina, V., Clinchant, S. (2013). Domain Adaptation of Statistical Machine Translation Models with Monolingual Data for Cross Lingual Information Retrieval. In: Serdyukov, P., et al. Advances in Information Retrieval. ECIR 2013. Lecture Notes in Computer Science, vol 7814. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36973-5_80

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-36973-5_80

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-36972-8

  • Online ISBN: 978-3-642-36973-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics