ABSTRACT
We present a diagnostic evaluation platform which provides multi-factored evaluation based on automatically constructed check-points. A check-point is a linguistically motivated unit (e.g. an ambiguous word, a noun phrase, a verb~obj collocation, a prepositional phrase etc.), which are pre-defined in a linguistic taxonomy. We present a method that automatically extracts check-points from parallel sentences. By means of checkpoints, our method can monitor a MT system in translating important linguistic phenomena to provide diagnostic evaluation. The effectiveness of our approach for diagnostic evaluation is verified through experiments on various types of MT systems.
- Statanjeev Banerjee, Alon Lavie. 2005. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgements. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization 2005.Google Scholar
- Chris Callison-Burch, Miles Osborne, Philipp Koehn. 2006. Re-evaluating the Role of Bleu in Machine Translation Research. In Proceedings of the European Chapter of the ACL 2006.Google Scholar
- Martin Chodorow, Claudia Leacock. 2000. An unsupervised method for detecting grammatical errors, In 1st Meeting of the North America Chapter of the ACL, pp. 140--147, 2000. Google ScholarDigital Library
- Thorsten Joachims. 1998. Making Large-scale Support Vector Machine Learning Practical, In B. Scholkopf, C. Burges, A. Smola. Advances in Kernel Methods: Support VectorMachines, MIT Press, Cambridge, MA, December.Google Scholar
- Jesus Gimenez and Llis Marquez. 2007. Linguistic features for automatic evaluation of heterogeneous MT systems, Workshop of statistical machine translation in conjunction with 45th ACL, 2007. Google ScholarDigital Library
- Dan Klein, Christopher Manning. 2003. Accurate Unlexicalized Parsing, Proceedings of the 41th Meeting of the ACL, pp. 423--430. Google ScholarDigital Library
- Philipp Koehn. 2004. Statistical Significance Tests for Machine Translation Evaluation. In Proc. of the EMNLP, Barcelona, Spain.Google Scholar
- Chiho Li, Minghui Li, Dongdong Zhang, Mu Li, Ming Zhou, Yi Guan. 2007. A Probabilistic Approach to Syntax-based Reor-dering for SMT. In Proceedings of the 45th ACL, 2007.Google Scholar
- Chin-Yew Lin and Franz Josef Och. 2004. Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In Proceedings of the 42th ACL 2004. Google ScholarDigital Library
- Ding Liu, Daniel Gildea. 2005. Syntactic Features for Evaluation of Machine Translation, ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization.Google Scholar
- Shuxin Liu. 2002. Linguistics of Contemporary Chinese Language (in Chinese), Advanced Education Publisher.Google Scholar
- Jiping Lv. 2000. Foundation of Mandarin Grammar (in Chinese), Shangwu Publisher.Google Scholar
- Franz Josef Och, Hermann Ney. 2003. A Systematic Comparison of Various Statistical Alignment Models, Computational Linguistics, volume 29, number 1, pp. 19--51 March 2003. Google ScholarDigital Library
- Kishore Papieni, Salim Roukos, Todd Ward, Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation, In Proceedings of the ACL 2002. Google ScholarDigital Library
- Shiwen Yu. 1993. Automatic evaluation of output quality for machine translation systems, In Proceedings of the evaluators' forum, April 21--24, 1991, Les Rasses, Vaud, 1993.Google Scholar
- Yang Ye, Ming Zhou, Chinyew Lin. 2007. Sentence level machine translation evaluation as a ranking problem: one step aside from BLEU, In Workshop of statistical machine translation, in conjunction with 45th ACL, 2007. Google ScholarDigital Library
- Ming Zhou. 2000, A Block-Based Robust Dependency Parser for Unrestricted Chinese Text. Proceedings of Second Chinese Language Processing Workshop, 2000, held in conjunction with ACL, 2000. Google ScholarDigital Library
Index Terms
- Diagnostic evaluation of machine translation systems using automatically constructed linguistic check-points
Recommendations
Improving statistical machine translation using shallow linguistic knowledge
We describe methods for improving the performance of statistical machine translation (SMT) between four linguistically different languages, i.e., Chinese, English, Japanese, and Korean by using morphosyntactic knowledge. For the purpose of reducing the ...
A diagnostic evaluation approach for english to hindi MT using linguistic checkpoints and error rates
CICLing'13: Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2This paper addresses diagnostic evaluation of machine translation (MT) systems for Indian languages, English to Hindi MT in particular, assessing the performance of MT systems on relevant linguistic phenomena (checkpoints). We use the diagnostic ...
Comments