research-article

Verdi: Quality Estimation and Error Detection for Bilingual Corpora

Authors:
Mingjun Zhao

University of Alberta, Canada

University of Alberta, Canada
View Profile

,
Haijiang Wu

Tencent, China

Tencent, China
View Profile

,
Di Niu

University of Alberta, Canada

University of Alberta, Canada
View Profile

,
Zixuan Wang

Tencent, China

Tencent, China
View Profile

,
Xiaoli Wang

Tencent, China

Tencent, China
View Profile

Authors Info & Claims

WWW '21: Proceedings of the Web Conference 2021April 2021Pages 3023–3031https://doi.org/10.1145/3442381.3449931

Published:03 June 2021Publication History

WWW '21: Proceedings of the Web Conference 2021

Pages 3023–3031

ABSTRACT

Translation Quality Estimation is critical to reducing post-editing efforts in machine translation and to cross-lingual corpus cleaning. As a research problem, quality estimation (QE) aims to directly estimate the quality of translation in a given pair of source and target sentences, and highlight the words that need corrections, without referencing to golden translations. In this paper, we propose Verdi, a novel framework for word-level and sentence-level post-editing effort estimation for bilingual corpora. Verdi adopts two word predictors to enable diverse features to be extracted from a pair of sentences for subsequent quality estimation, including a transformer-based neural machine translation (NMT) model and a pre-trained cross-lingual language model (XLM). We exploit the symmetric nature of bilingual corpora and apply model-level dual learning in the NMT predictor, which handles a primal task and a dual task simultaneously with weight sharing, leading to stronger context prediction ability than single-direction NMT models. By taking advantage of the dual learning scheme, we further design a novel feature to directly encode the translated target information without relying on the source context. Extensive experiments conducted on WMT20 QE tasks demonstrate that our method beats the winner of the competition and outperforms other baseline methods by a great margin. We further use the sentence-level scores provided by Verdi to clean a parallel corpus and observe benefits on both model performance and training efficiency.

References

Daniel Beck, Kashif Shah, Trevor Cohn, and Lucia Specia. 2013. SHEF-Lite: When less is more for translation quality estimation. In Proceedings of the Eighth Workshop on Statistical Machine Translation. 337–342.Google Scholar
Jacob Benesty, Jingdong Chen, Yiteng Huang, and Israel Cohen. 2009. Pearson correlation coefficient. In Noise reduction in speech processing. Springer, 1–4.Google ScholarDigital Library
Wanxiang Che, Yunlong Feng, Libo Qin, and Ting Liu. 2020. N-LTP: A Open-source Neural Chinese Language Technology Platform with Pretrained Models. arXiv preprint arXiv:2009.11616(2020).Google Scholar
Alexis Conneau and Guillaume Lample. 2019. Cross-lingual language model pretraining. In Advances in Neural Information Processing Systems. 7059–7069.Google Scholar
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805(2018).Google Scholar
Kai Fan, Jiayi Wang, Bo Li, Fengming Zhou, Boxing Chen, and Luo Si. 2019. “Bilingual Expert” Can Find Translation Errors. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 6367–6374.Google ScholarDigital Library
Mariano Felice and Lucia Specia. 2012. Linguistic features for quality estimation. In Proceedings of the Seventh Workshop on Statistical Machine Translation. 96–103.Google ScholarDigital Library
Jesús González-Rubio, Alberto Sanchis, and Francisco Casacuberta. 2012. PRHLT submission to the WMT12 quality estimation task. In Proceedings of the seventh workshop on statistical machine translation. 104–108.Google Scholar
Di He, Yingce Xia, Tao Qin, Liwei Wang, Nenghai Yu, Tie-Yan Liu, and Wei-Ying Ma. 2016. Dual learning for machine translation. In Advances in neural information processing systems. 820–828.Google Scholar
Fabio Kepler, Jonay Trénous, Marcos Treviso, Miguel Vera, António Góis, M Amin Farajian, António V Lopes, and André FT Martins. 2019. Unbabel’s Participation in the WMT19 Translation Quality Estimation Shared Task. In Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2). 78–84.Google ScholarCross Ref
Fabio Kepler, Jonay Trénous, Marcos Treviso, Miguel Vera, and André FT Martins. 2019. OpenKiwi: An Open Source Framework for Quality Estimation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 117–122.Google ScholarCross Ref
Hyun Kim, Hun-Young Jung, Hongseok Kwon, Jong-Hyeok Lee, and Seung-Hoon Na. 2017. Predictor-estimator: Neural quality estimation based on target word prediction for machine translation. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 17, 1 (2017), 1–22.Google ScholarDigital Library
Anna Kozlova, Mariya Shmatova, and Anton Frolov. 2016. Ysda participation in the wmt’16 quality estimation shared task. In Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers. 793–799.Google ScholarCross Ref
Julia Kreutzer, Shigehiko Schamoni, and Stefan Riezler. 2015. Quality estimation from scratch (quetch): Deep learning for word-level translation quality estimation. In Proceedings of the Tenth Workshop on Statistical Machine Translation. 316–322.Google ScholarCross Ref
André FT Martins, Ramón Astudillo, Chris Hokamp, and Fabio Kepler. 2016. Unbabel’s participation in the wmt16 word-level translation quality estimation shared task. In Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers. 806–811.Google ScholarCross Ref
André FT Martins, Marcin Junczys-Dowmunt, Fabio N Kepler, Ramón Astudillo, Chris Hokamp, and Roman Grundkiewicz. 2017. Pushing the limits of translation quality estimation. Transactions of the Association for Computational Linguistics 5 (2017), 205–218.Google ScholarCross Ref
Brian W Matthews. 1975. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-Protein Structure 405, 2(1975), 442–451.Google ScholarCross Ref
Geoffrey J McLachlan and Thriyambakam Krishnan. 2007. The EM algorithm and extensions. Vol. 382. John Wiley & Sons.Google Scholar
Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, and Michael Auli. 2019. fairseq: A Fast, Extensible Toolkit for Sequence Modeling. In Proceedings of NAACL-HLT 2019: Demonstrations.Google ScholarCross Ref
Raj Nath Patel and M Sasikumar. 2016. Translation Quality Estimation using Recurrent Neural Network. In Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers. 819–824.Google ScholarCross Ref
Raphael Rubino, Jose de Souza, Jennifer Foster, and Lucia Specia. 2013. Topic models for translation quality estimation for gisting purposes. (2013).Google Scholar
Kashif Shah, Trevor Cohn, and Lucia Specia. 2015. A bayesian non-linear method for feature selection in machine translation quality estimation. Machine Translation 29, 2 (2015), 101–125.Google ScholarDigital Library
Tianxiao Shen, Myle Ott, Michael Auli, and Marc’Aurelio Ranzato. 2019. Mixture Models for Diverse Machine Translation: Tricks of the Trade. In International Conference on Machine Learning. 5719–5728.Google Scholar
Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of association for machine translation in the Americas, Vol. 200. Cambridge, MA.Google Scholar
Lucia Specia, Kashif Shah, Jose G.C. de Souza, and Trevor Cohn. 2013. QuEst - A translation quality estimation framework. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics, Sofia, Bulgaria, 79–84. https://www.aclweb.org/anthology/P13-4014Google Scholar
Nicola Ueffing and Hermann Ney. 2007. Word-level confidence estimation for machine translation. Computational Linguistics 33, 1 (2007), 9–40.Google ScholarDigital Library
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998–6008.Google Scholar
Wei Wang, Taro Watanabe, Macduff Hughes, Tetsuji Nakagawa, and Ciprian Chelba. 2018. Denoising Neural Machine Translation Training with Trusted Data and Online Data Selection. In Proceedings of the Third Conference on Machine Translation: Research Papers. 133–143.Google ScholarCross Ref
David H Wolpert. 1992. Stacked generalization. Neural networks 5, 2 (1992), 241–259.Google Scholar
Yingce Xia, Tao Qin, Wei Chen, Jiang Bian, Nenghai Yu, and Tie-Yan Liu. 2017. Dual supervised learning. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. 3789–3798.Google ScholarDigital Library
Yingce Xia, Xu Tan, Fei Tian, Tao Qin, Nenghai Yu, and Tie-Yan Liu. 2018. Model-level dual learning. In International Conference on Machine Learning. 5383–5392.Google Scholar

Verdi: Quality Estimation and Error Detection for Bilingual Corpora
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

Automatic induction of bilingual resources from aligned parallel corpora: application to shallow-transfer machine translation

The availability of machine-readable bilingual linguistic resources is crucial not only for rule-based machine translation but also for other applications such as cross-lingual information retrieval. However, the building of such resources (bilingual ...
Read More
A Quality Estimation System for Hungarian
Human Language Technology. Challenges for Computer Science and Linguistics
Abstract
Quality estimation is an important field of machine translation evaluation. There are automatic evaluation methods for machine translation that use reference translations created by human translators. The creation of these reference translations ...
Read More
Predicting insertion positions in word-level machine translation quality estimation
Abstract
Word-level machine translation (MT) quality estimation (QE) is usually formulated as the task of automatically identifying which words need to be edited (either deleted or replaced) in a translation T produced by an MT system. The ...
Highlights
- Novel appproach predicting insertions in machine translation quality estimation.
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '21: Proceedings of the Web Conference 2021
April 2021
4054 pages
ISBN:9781450383127
DOI:10.1145/3442381
Editors:
Jure Leskovec
Stanford
,
Marko Grobelnik
Jožef Stefan Institute
,
Marc Najork
Google
,
Jie Tang
Tsinghua University
,
Leila Zia
Wikimedia Foundation
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 June 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Bilingual Corpus Filtering
Machine Translation
Model-level Dual Learning
Quality Estimation
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 80
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Verdi: Quality Estimation and Error Detection for Bilingual Corpora

WWW '21: Proceedings of the Web Conference 2021

ABSTRACT

References

Cited By

Recommendations

Automatic induction of bilingual resources from aligned parallel corpora: application to shallow-transfer machine translation

A Quality Estimation System for Hungarian

Predicting insertion positions in word-level machine translation quality estimation