Skip to main content

Reciprocal Enrichment Between Basque Wikipedia and Machine Translation

  • Chapter
  • First Online:

Abstract

In this chapter, we define a collaboration framework that enables Wikipedia editors to generate new articles while they help development of Machine Translation (MT) systems by providing post-edition logs. This collaboration framework was tested with editors of Basque Wikipedia. Their post-editing of Computer Science articles has been used to improve the output of a Spanish to Basque MT system called Matxin. For the collaboration between editors and researchers, we selected a set of 100 articles from the Spanish Wikipedia. These articles would then be used as the source texts to be translated into Basque using the MT engine. A group of volunteers from Basque Wikipedia reviewed and corrected the raw MT translations. This collaboration ultimately produced two main benefits: (i) the change logs that would potentially help improve the MT engine by using an automated statistical post-editing system, and (ii) the growth of Basque Wikipedia. The results show that this process can improve the accuracy of an Rule Based MT (RBMT) system in nearly 10 % benefiting from the post-edition of 50,000 words in the Computer Science domain. We believe that our conclusions can be extended to MT engines involving other less-resourced languages lacking large parallel corpora or frequently updated lexical knowledge, as well as to other domains.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    There are around 700,000 speakers, around 25 % of the total population of the Basque Country.

  2. 2.

    www.translationautomation.com

  3. 3.

    http://translate.google.com

  4. 4.

    http://www.commonsenseadvisory.com/Default.aspx?Contenttype=ArticleDetAD&tabID=63&Aid=1180&moduleId=390

  5. 5.

    http://lingotek.com

  6. 6.

    http://www.translingual-europe.eu/slides/WillemStoeller.pdf

  7. 7.

    http://www.omegat.org

  8. 8.

    http://translate.sourceforge.net/wiki/virtaal

  9. 9.

    http://eu.wikipedia.org/wiki/Wikiproiektu:OpenMT2_eta_Euskal_Wikipedia

  10. 10.

    http://eu.wikipedia.org/wiki/Wikiproiektu:OpenMT2_eta_Euskal_Wikipedia

  11. 11.

    http://eu.wikipedia.org/w/index.php?title=Berezi:ZerkLotzenDuHona/Txantiloi:OpenMT-2

  12. 12.

    http://ixa2.si.ehu.es/glabaka/OmegaT/OpenMT-OmegaT-CS-TM.zip

  13. 13.

    http://eu.wikipedia.org/w/index.php?title=Berezi:ZerkLotzenDuHona/Txantiloi:OpenMT-2

  14. 14.

    http://ixa2.si.ehu.es/glabaka/lokalizazioa.tmx

  15. 15.

    http://ixa2.si.ehu.es/glabaka/OmegaT/OpenMT-OmegaT-CS-TM.zip

  16. 16.

    http://www.unibertsitatea.net/blogak/testuak-lantzen/2011/11/22/wikigaiak4koa

  17. 17.

    http://ixa2.si.ehu.es/glabaka/OmegaT/OpenMT-OmegaT.zip

  18. 18.

    http://siuc01.si.ehu.es/~jipsagak/OpenMT_Wiki/Eskuliburua_Euwikipedia+Omegat+Matxin.pdfhttp://siuc01.si.ehu.es/ ∼ jipsagak/OpenMT_Wiki/Eskuliburua_Euwikipedia+Omegat+Matxin.pdf

  19. 19.

    http://ixa2.si.ehu.es/matxin_zerb/translate.cgi

References

  1. Alegria I, Diaz de Ilarraza A, Labaka G, Lersundi M, Mayor A, Sarasola K (2007) Transfer-based MT from Spanish into Basque: reusability, standardization and open source. In: CICLing 2007. Lecture notes in computer science, vol 4394. Springer, Berlin/New York, pp 374–384

    Google Scholar 

  2. Alegria I, Diaz de Ilarraza A, Labaka G, Lersundi M, Mayor A, Sarasola K (2011) Matxin-Informatika: Versión del traductor Matxin adaptada al dominio de la informática. In: Proceedings of the XXVII Congreso SEPLN, Huelva, Spain, pp 321–322

    Google Scholar 

  3. Boitet C, Huynh CP, Nguyen HT, Bellynck V (2010) The iMAG concept: multilingual access gateway to an elected web sites with incremental quality increase through collaborative post-edition of MT pretranslations. In: Proceedings of Traitement Automatique du Langage Naturel, TALN, Montréal

    Google Scholar 

  4. Diaz de Ilarraza A, Labaka G, Sarasola K (2008) Statistical post-editing: a valuable method in domain adaptation of RBMT systems. In: Proceedings of MATMT2008 workshop: mixing approaches to machine translation, Euskal Herriko Unibersitatea, Donostia, pp 35–40

    Google Scholar 

  5. Dugast L, Senellart J, Koehn P (2007) Statistical post-editing on SYSTRAN’s rule-based translation system. In: Proceedings of the second workshop on statistical machine translation, Prague, pp 220–223

    Google Scholar 

  6. Dugast L, Senellart J, Koehn P (2009) Statistical post editing and dictionary extraction: Systran/Edinburgh submissions for ACL-WMT2009. In: Proceedings of the fourth workshop on statistical machine translation, Athens, pp 110–114

    Google Scholar 

  7. Isabelle P, Goutte C, Simard M (2007) Domain adaptation of MT systems through automatic post-editing. In: Proceedings of the MT Summit XI, Copenhagen, pp 255–261

    Google Scholar 

  8. Lagarda AL, Alabau V, Casacuberta F, Silva R, Díaz-de-Liaño E (2009) Statistical post-editing of a rule-based machine translation system. In: Proceedings of NAACL HLT 2009. Human language technologies: the 2009 annual conference of the North American chapter of the ACL, Short Papers, Boulder, pp 217–220

    Google Scholar 

  9. Mayor A, Diaz de Ilarraza A, Labaka G, Lersundi M, Sarasola K (2011) Matxin, an open-source rule-based machine translation system for Basque. Mach Transl J 25(1):53–82

    Article  Google Scholar 

  10. Potet M, Esperança-Rodier E, Blanchon H, Besacier L (2011) Preliminary experiments on using users’ post-editions to enhance a SMT system. In: Forcada ML, Depraetere H, Vandeghinste V (eds) Proceedings of the 15th conference of the European association for machine translation, Leuven, Belgium, pp 161–168

    Google Scholar 

  11. Simard M, Ueffing N, Isabelle P, Kuhn R (2007) Rule-based translation with statistical phrase-based post-editing. In: Proceedings of the second workshop on statistical machine translation, Prague, pp 203–206

    Google Scholar 

  12. Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2007) A study of translation edit rate with targeted human annotation. In: Proceedings of the 7th Biennial conference of the association for machine translation in the Americas (AMTA), Cambridge, Massachusetts, USA, pp 223–231

    Google Scholar 

  13. Way A (2010) Machine translation. In: Clark A, Fox C, Lappin S (eds) The handbook of computational linguistics and natural language processing. Wiley-Blackwell, Oxford, pp 531–573

    Chapter  Google Scholar 

Download references

Acknowledgements

This research was supported in part by the Spanish Ministry of Education and Science (OpenMT2, TIN2009-14675-C03-01) and by the Basque Government (Berbatek project, IE09–262). We are indebted to all the collaborators in the project and especially to the editors of the Basque Wikipedia. Elhuyar and Julen Ruiz helped us to collect resources for the customization of the RBMT engine to the domain of Computer Science.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Iñaki Alegria or Kepa Sarasola .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Alegria, I. et al. (2013). Reciprocal Enrichment Between Basque Wikipedia and Machine Translation. In: Gurevych, I., Kim, J. (eds) The People’s Web Meets NLP. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35085-6_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35085-6_4

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35084-9

  • Online ISBN: 978-3-642-35085-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics