skip to main content
10.1145/3442381.3449931acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Verdi: Quality Estimation and Error Detection for Bilingual Corpora

Published:03 June 2021Publication History

ABSTRACT

Translation Quality Estimation is critical to reducing post-editing efforts in machine translation and to cross-lingual corpus cleaning. As a research problem, quality estimation (QE) aims to directly estimate the quality of translation in a given pair of source and target sentences, and highlight the words that need corrections, without referencing to golden translations. In this paper, we propose Verdi, a novel framework for word-level and sentence-level post-editing effort estimation for bilingual corpora. Verdi adopts two word predictors to enable diverse features to be extracted from a pair of sentences for subsequent quality estimation, including a transformer-based neural machine translation (NMT) model and a pre-trained cross-lingual language model (XLM). We exploit the symmetric nature of bilingual corpora and apply model-level dual learning in the NMT predictor, which handles a primal task and a dual task simultaneously with weight sharing, leading to stronger context prediction ability than single-direction NMT models. By taking advantage of the dual learning scheme, we further design a novel feature to directly encode the translated target information without relying on the source context. Extensive experiments conducted on WMT20 QE tasks demonstrate that our method beats the winner of the competition and outperforms other baseline methods by a great margin. We further use the sentence-level scores provided by Verdi to clean a parallel corpus and observe benefits on both model performance and training efficiency.

References

  1. Daniel Beck, Kashif Shah, Trevor Cohn, and Lucia Specia. 2013. SHEF-Lite: When less is more for translation quality estimation. In Proceedings of the Eighth Workshop on Statistical Machine Translation. 337–342.Google ScholarGoogle Scholar
  2. Jacob Benesty, Jingdong Chen, Yiteng Huang, and Israel Cohen. 2009. Pearson correlation coefficient. In Noise reduction in speech processing. Springer, 1–4.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Wanxiang Che, Yunlong Feng, Libo Qin, and Ting Liu. 2020. N-LTP: A Open-source Neural Chinese Language Technology Platform with Pretrained Models. arXiv preprint arXiv:2009.11616(2020).Google ScholarGoogle Scholar
  4. Alexis Conneau and Guillaume Lample. 2019. Cross-lingual language model pretraining. In Advances in Neural Information Processing Systems. 7059–7069.Google ScholarGoogle Scholar
  5. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805(2018).Google ScholarGoogle Scholar
  6. Kai Fan, Jiayi Wang, Bo Li, Fengming Zhou, Boxing Chen, and Luo Si. 2019. “Bilingual Expert” Can Find Translation Errors. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 6367–6374.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Mariano Felice and Lucia Specia. 2012. Linguistic features for quality estimation. In Proceedings of the Seventh Workshop on Statistical Machine Translation. 96–103.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jesús González-Rubio, Alberto Sanchis, and Francisco Casacuberta. 2012. PRHLT submission to the WMT12 quality estimation task. In Proceedings of the seventh workshop on statistical machine translation. 104–108.Google ScholarGoogle Scholar
  9. Di He, Yingce Xia, Tao Qin, Liwei Wang, Nenghai Yu, Tie-Yan Liu, and Wei-Ying Ma. 2016. Dual learning for machine translation. In Advances in neural information processing systems. 820–828.Google ScholarGoogle Scholar
  10. Fabio Kepler, Jonay Trénous, Marcos Treviso, Miguel Vera, António Góis, M Amin Farajian, António V Lopes, and André FT Martins. 2019. Unbabel’s Participation in the WMT19 Translation Quality Estimation Shared Task. In Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2). 78–84.Google ScholarGoogle ScholarCross RefCross Ref
  11. Fabio Kepler, Jonay Trénous, Marcos Treviso, Miguel Vera, and André FT Martins. 2019. OpenKiwi: An Open Source Framework for Quality Estimation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 117–122.Google ScholarGoogle ScholarCross RefCross Ref
  12. Hyun Kim, Hun-Young Jung, Hongseok Kwon, Jong-Hyeok Lee, and Seung-Hoon Na. 2017. Predictor-estimator: Neural quality estimation based on target word prediction for machine translation. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 17, 1 (2017), 1–22.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Anna Kozlova, Mariya Shmatova, and Anton Frolov. 2016. Ysda participation in the wmt’16 quality estimation shared task. In Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers. 793–799.Google ScholarGoogle ScholarCross RefCross Ref
  14. Julia Kreutzer, Shigehiko Schamoni, and Stefan Riezler. 2015. Quality estimation from scratch (quetch): Deep learning for word-level translation quality estimation. In Proceedings of the Tenth Workshop on Statistical Machine Translation. 316–322.Google ScholarGoogle ScholarCross RefCross Ref
  15. André FT Martins, Ramón Astudillo, Chris Hokamp, and Fabio Kepler. 2016. Unbabel’s participation in the wmt16 word-level translation quality estimation shared task. In Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers. 806–811.Google ScholarGoogle ScholarCross RefCross Ref
  16. André FT Martins, Marcin Junczys-Dowmunt, Fabio N Kepler, Ramón Astudillo, Chris Hokamp, and Roman Grundkiewicz. 2017. Pushing the limits of translation quality estimation. Transactions of the Association for Computational Linguistics 5 (2017), 205–218.Google ScholarGoogle ScholarCross RefCross Ref
  17. Brian W Matthews. 1975. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-Protein Structure 405, 2(1975), 442–451.Google ScholarGoogle ScholarCross RefCross Ref
  18. Geoffrey J McLachlan and Thriyambakam Krishnan. 2007. The EM algorithm and extensions. Vol. 382. John Wiley & Sons.Google ScholarGoogle Scholar
  19. Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, and Michael Auli. 2019. fairseq: A Fast, Extensible Toolkit for Sequence Modeling. In Proceedings of NAACL-HLT 2019: Demonstrations.Google ScholarGoogle ScholarCross RefCross Ref
  20. Raj Nath Patel and M Sasikumar. 2016. Translation Quality Estimation using Recurrent Neural Network. In Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers. 819–824.Google ScholarGoogle ScholarCross RefCross Ref
  21. Raphael Rubino, Jose de Souza, Jennifer Foster, and Lucia Specia. 2013. Topic models for translation quality estimation for gisting purposes. (2013).Google ScholarGoogle Scholar
  22. Kashif Shah, Trevor Cohn, and Lucia Specia. 2015. A bayesian non-linear method for feature selection in machine translation quality estimation. Machine Translation 29, 2 (2015), 101–125.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Tianxiao Shen, Myle Ott, Michael Auli, and Marc’Aurelio Ranzato. 2019. Mixture Models for Diverse Machine Translation: Tricks of the Trade. In International Conference on Machine Learning. 5719–5728.Google ScholarGoogle Scholar
  24. Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of association for machine translation in the Americas, Vol. 200. Cambridge, MA.Google ScholarGoogle Scholar
  25. Lucia Specia, Kashif Shah, Jose G.C. de Souza, and Trevor Cohn. 2013. QuEst - A translation quality estimation framework. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics, Sofia, Bulgaria, 79–84. https://www.aclweb.org/anthology/P13-4014Google ScholarGoogle Scholar
  26. Nicola Ueffing and Hermann Ney. 2007. Word-level confidence estimation for machine translation. Computational Linguistics 33, 1 (2007), 9–40.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998–6008.Google ScholarGoogle Scholar
  28. Wei Wang, Taro Watanabe, Macduff Hughes, Tetsuji Nakagawa, and Ciprian Chelba. 2018. Denoising Neural Machine Translation Training with Trusted Data and Online Data Selection. In Proceedings of the Third Conference on Machine Translation: Research Papers. 133–143.Google ScholarGoogle ScholarCross RefCross Ref
  29. David H Wolpert. 1992. Stacked generalization. Neural networks 5, 2 (1992), 241–259.Google ScholarGoogle Scholar
  30. Yingce Xia, Tao Qin, Wei Chen, Jiang Bian, Nenghai Yu, and Tie-Yan Liu. 2017. Dual supervised learning. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. 3789–3798.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Yingce Xia, Xu Tan, Fei Tian, Tao Qin, Nenghai Yu, and Tie-Yan Liu. 2018. Model-level dual learning. In International Conference on Machine Learning. 5383–5392.Google ScholarGoogle Scholar
  1. Verdi: Quality Estimation and Error Detection for Bilingual Corpora

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      WWW '21: Proceedings of the Web Conference 2021
      April 2021
      4054 pages
      ISBN:9781450383127
      DOI:10.1145/3442381

      Copyright © 2021 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 3 June 2021

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate1,899of8,196submissions,23%
    • Article Metrics

      • Downloads (Last 12 months)8
      • Downloads (Last 6 weeks)1

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format