ABSTRACT
Translation Quality Estimation is critical to reducing post-editing efforts in machine translation and to cross-lingual corpus cleaning. As a research problem, quality estimation (QE) aims to directly estimate the quality of translation in a given pair of source and target sentences, and highlight the words that need corrections, without referencing to golden translations. In this paper, we propose Verdi, a novel framework for word-level and sentence-level post-editing effort estimation for bilingual corpora. Verdi adopts two word predictors to enable diverse features to be extracted from a pair of sentences for subsequent quality estimation, including a transformer-based neural machine translation (NMT) model and a pre-trained cross-lingual language model (XLM). We exploit the symmetric nature of bilingual corpora and apply model-level dual learning in the NMT predictor, which handles a primal task and a dual task simultaneously with weight sharing, leading to stronger context prediction ability than single-direction NMT models. By taking advantage of the dual learning scheme, we further design a novel feature to directly encode the translated target information without relying on the source context. Extensive experiments conducted on WMT20 QE tasks demonstrate that our method beats the winner of the competition and outperforms other baseline methods by a great margin. We further use the sentence-level scores provided by Verdi to clean a parallel corpus and observe benefits on both model performance and training efficiency.
- Daniel Beck, Kashif Shah, Trevor Cohn, and Lucia Specia. 2013. SHEF-Lite: When less is more for translation quality estimation. In Proceedings of the Eighth Workshop on Statistical Machine Translation. 337–342.Google Scholar
- Jacob Benesty, Jingdong Chen, Yiteng Huang, and Israel Cohen. 2009. Pearson correlation coefficient. In Noise reduction in speech processing. Springer, 1–4.Google ScholarDigital Library
- Wanxiang Che, Yunlong Feng, Libo Qin, and Ting Liu. 2020. N-LTP: A Open-source Neural Chinese Language Technology Platform with Pretrained Models. arXiv preprint arXiv:2009.11616(2020).Google Scholar
- Alexis Conneau and Guillaume Lample. 2019. Cross-lingual language model pretraining. In Advances in Neural Information Processing Systems. 7059–7069.Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805(2018).Google Scholar
- Kai Fan, Jiayi Wang, Bo Li, Fengming Zhou, Boxing Chen, and Luo Si. 2019. “Bilingual Expert” Can Find Translation Errors. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 6367–6374.Google ScholarDigital Library
- Mariano Felice and Lucia Specia. 2012. Linguistic features for quality estimation. In Proceedings of the Seventh Workshop on Statistical Machine Translation. 96–103.Google ScholarDigital Library
- Jesús González-Rubio, Alberto Sanchis, and Francisco Casacuberta. 2012. PRHLT submission to the WMT12 quality estimation task. In Proceedings of the seventh workshop on statistical machine translation. 104–108.Google Scholar
- Di He, Yingce Xia, Tao Qin, Liwei Wang, Nenghai Yu, Tie-Yan Liu, and Wei-Ying Ma. 2016. Dual learning for machine translation. In Advances in neural information processing systems. 820–828.Google Scholar
- Fabio Kepler, Jonay Trénous, Marcos Treviso, Miguel Vera, António Góis, M Amin Farajian, António V Lopes, and André FT Martins. 2019. Unbabel’s Participation in the WMT19 Translation Quality Estimation Shared Task. In Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2). 78–84.Google ScholarCross Ref
- Fabio Kepler, Jonay Trénous, Marcos Treviso, Miguel Vera, and André FT Martins. 2019. OpenKiwi: An Open Source Framework for Quality Estimation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 117–122.Google ScholarCross Ref
- Hyun Kim, Hun-Young Jung, Hongseok Kwon, Jong-Hyeok Lee, and Seung-Hoon Na. 2017. Predictor-estimator: Neural quality estimation based on target word prediction for machine translation. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 17, 1 (2017), 1–22.Google ScholarDigital Library
- Anna Kozlova, Mariya Shmatova, and Anton Frolov. 2016. Ysda participation in the wmt’16 quality estimation shared task. In Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers. 793–799.Google ScholarCross Ref
- Julia Kreutzer, Shigehiko Schamoni, and Stefan Riezler. 2015. Quality estimation from scratch (quetch): Deep learning for word-level translation quality estimation. In Proceedings of the Tenth Workshop on Statistical Machine Translation. 316–322.Google ScholarCross Ref
- André FT Martins, Ramón Astudillo, Chris Hokamp, and Fabio Kepler. 2016. Unbabel’s participation in the wmt16 word-level translation quality estimation shared task. In Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers. 806–811.Google ScholarCross Ref
- André FT Martins, Marcin Junczys-Dowmunt, Fabio N Kepler, Ramón Astudillo, Chris Hokamp, and Roman Grundkiewicz. 2017. Pushing the limits of translation quality estimation. Transactions of the Association for Computational Linguistics 5 (2017), 205–218.Google ScholarCross Ref
- Brian W Matthews. 1975. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-Protein Structure 405, 2(1975), 442–451.Google ScholarCross Ref
- Geoffrey J McLachlan and Thriyambakam Krishnan. 2007. The EM algorithm and extensions. Vol. 382. John Wiley & Sons.Google Scholar
- Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, and Michael Auli. 2019. fairseq: A Fast, Extensible Toolkit for Sequence Modeling. In Proceedings of NAACL-HLT 2019: Demonstrations.Google ScholarCross Ref
- Raj Nath Patel and M Sasikumar. 2016. Translation Quality Estimation using Recurrent Neural Network. In Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers. 819–824.Google ScholarCross Ref
- Raphael Rubino, Jose de Souza, Jennifer Foster, and Lucia Specia. 2013. Topic models for translation quality estimation for gisting purposes. (2013).Google Scholar
- Kashif Shah, Trevor Cohn, and Lucia Specia. 2015. A bayesian non-linear method for feature selection in machine translation quality estimation. Machine Translation 29, 2 (2015), 101–125.Google ScholarDigital Library
- Tianxiao Shen, Myle Ott, Michael Auli, and Marc’Aurelio Ranzato. 2019. Mixture Models for Diverse Machine Translation: Tricks of the Trade. In International Conference on Machine Learning. 5719–5728.Google Scholar
- Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of association for machine translation in the Americas, Vol. 200. Cambridge, MA.Google Scholar
- Lucia Specia, Kashif Shah, Jose G.C. de Souza, and Trevor Cohn. 2013. QuEst - A translation quality estimation framework. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics, Sofia, Bulgaria, 79–84. https://www.aclweb.org/anthology/P13-4014Google Scholar
- Nicola Ueffing and Hermann Ney. 2007. Word-level confidence estimation for machine translation. Computational Linguistics 33, 1 (2007), 9–40.Google ScholarDigital Library
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998–6008.Google Scholar
- Wei Wang, Taro Watanabe, Macduff Hughes, Tetsuji Nakagawa, and Ciprian Chelba. 2018. Denoising Neural Machine Translation Training with Trusted Data and Online Data Selection. In Proceedings of the Third Conference on Machine Translation: Research Papers. 133–143.Google ScholarCross Ref
- David H Wolpert. 1992. Stacked generalization. Neural networks 5, 2 (1992), 241–259.Google Scholar
- Yingce Xia, Tao Qin, Wei Chen, Jiang Bian, Nenghai Yu, and Tie-Yan Liu. 2017. Dual supervised learning. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. 3789–3798.Google ScholarDigital Library
- Yingce Xia, Xu Tan, Fei Tian, Tao Qin, Nenghai Yu, and Tie-Yan Liu. 2018. Model-level dual learning. In International Conference on Machine Learning. 5383–5392.Google Scholar
- Verdi: Quality Estimation and Error Detection for Bilingual Corpora
Recommendations
Automatic induction of bilingual resources from aligned parallel corpora: application to shallow-transfer machine translation
The availability of machine-readable bilingual linguistic resources is crucial not only for rule-based machine translation but also for other applications such as cross-lingual information retrieval. However, the building of such resources (bilingual ...
A Quality Estimation System for Hungarian
Human Language Technology. Challenges for Computer Science and LinguisticsAbstractQuality estimation is an important field of machine translation evaluation. There are automatic evaluation methods for machine translation that use reference translations created by human translators. The creation of these reference translations ...
Predicting insertion positions in word-level machine translation quality estimation
AbstractWord-level machine translation (MT) quality estimation (QE) is usually formulated as the task of automatically identifying which words need to be edited (either deleted or replaced) in a translation T produced by an MT system. The ...
Highlights- Novel appproach predicting insertions in machine translation quality estimation.
Comments