skip to main content
10.5555/1599081.1599222dlproceedingsArticle/Chapter ViewAbstractPublication PagescolingConference Proceedingsconference-collections
research-article
Free Access

Diagnostic evaluation of machine translation systems using automatically constructed linguistic check-points

Authors Info & Claims
Published:18 August 2008Publication History

ABSTRACT

We present a diagnostic evaluation platform which provides multi-factored evaluation based on automatically constructed check-points. A check-point is a linguistically motivated unit (e.g. an ambiguous word, a noun phrase, a verb~obj collocation, a prepositional phrase etc.), which are pre-defined in a linguistic taxonomy. We present a method that automatically extracts check-points from parallel sentences. By means of checkpoints, our method can monitor a MT system in translating important linguistic phenomena to provide diagnostic evaluation. The effectiveness of our approach for diagnostic evaluation is verified through experiments on various types of MT systems.

References

  1. Statanjeev Banerjee, Alon Lavie. 2005. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgements. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization 2005.Google ScholarGoogle Scholar
  2. Chris Callison-Burch, Miles Osborne, Philipp Koehn. 2006. Re-evaluating the Role of Bleu in Machine Translation Research. In Proceedings of the European Chapter of the ACL 2006.Google ScholarGoogle Scholar
  3. Martin Chodorow, Claudia Leacock. 2000. An unsupervised method for detecting grammatical errors, In 1st Meeting of the North America Chapter of the ACL, pp. 140--147, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Thorsten Joachims. 1998. Making Large-scale Support Vector Machine Learning Practical, In B. Scholkopf, C. Burges, A. Smola. Advances in Kernel Methods: Support VectorMachines, MIT Press, Cambridge, MA, December.Google ScholarGoogle Scholar
  5. Jesus Gimenez and Llis Marquez. 2007. Linguistic features for automatic evaluation of heterogeneous MT systems, Workshop of statistical machine translation in conjunction with 45th ACL, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Dan Klein, Christopher Manning. 2003. Accurate Unlexicalized Parsing, Proceedings of the 41th Meeting of the ACL, pp. 423--430. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Philipp Koehn. 2004. Statistical Significance Tests for Machine Translation Evaluation. In Proc. of the EMNLP, Barcelona, Spain.Google ScholarGoogle Scholar
  8. Chiho Li, Minghui Li, Dongdong Zhang, Mu Li, Ming Zhou, Yi Guan. 2007. A Probabilistic Approach to Syntax-based Reor-dering for SMT. In Proceedings of the 45th ACL, 2007.Google ScholarGoogle Scholar
  9. Chin-Yew Lin and Franz Josef Och. 2004. Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In Proceedings of the 42th ACL 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Ding Liu, Daniel Gildea. 2005. Syntactic Features for Evaluation of Machine Translation, ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization.Google ScholarGoogle Scholar
  11. Shuxin Liu. 2002. Linguistics of Contemporary Chinese Language (in Chinese), Advanced Education Publisher.Google ScholarGoogle Scholar
  12. Jiping Lv. 2000. Foundation of Mandarin Grammar (in Chinese), Shangwu Publisher.Google ScholarGoogle Scholar
  13. Franz Josef Och, Hermann Ney. 2003. A Systematic Comparison of Various Statistical Alignment Models, Computational Linguistics, volume 29, number 1, pp. 19--51 March 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kishore Papieni, Salim Roukos, Todd Ward, Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation, In Proceedings of the ACL 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Shiwen Yu. 1993. Automatic evaluation of output quality for machine translation systems, In Proceedings of the evaluators' forum, April 21--24, 1991, Les Rasses, Vaud, 1993.Google ScholarGoogle Scholar
  16. Yang Ye, Ming Zhou, Chinyew Lin. 2007. Sentence level machine translation evaluation as a ranking problem: one step aside from BLEU, In Workshop of statistical machine translation, in conjunction with 45th ACL, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ming Zhou. 2000, A Block-Based Robust Dependency Parser for Unrestricted Chinese Text. Proceedings of Second Chinese Language Processing Workshop, 2000, held in conjunction with ACL, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Diagnostic evaluation of machine translation systems using automatically constructed linguistic check-points

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image DL Hosted proceedings
              COLING '08: Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
              August 2008
              1178 pages
              ISBN:9781905593446

              Publisher

              Association for Computational Linguistics

              United States

              Publication History

              • Published: 18 August 2008

              Qualifiers

              • research-article

              Acceptance Rates

              Overall Acceptance Rate1,537of1,537submissions,100%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader