skip to main content
10.3115/1220575.1220668dlproceedingsArticle/Chapter ViewAbstractPublication PageshltConference Proceedingsconference-collections
Article
Free Access

BLANC: learning evaluation metrics for MT

Published:06 October 2005Publication History

ABSTRACT

We introduce BLANC, a family of dynamic, trainable evaluation metrics for machine translation. Flexible, parametrized models can be learned from past data and automatically optimized to correlate well with human judgments for different criteria (e.g. adequacy, fluency) using different correlation measures. Towards this end, we discuss ACS (all common skip-ngrams), a practical algorithm with trainable parameters that estimates reference-candidate translation overlap by computing a weighted sum of all common skip-ngrams in polynomial time. We show that the BLEU and ROUGE metric families are special cases of BLANC, and we compare correlations with human judgments across these three metric families. We analyze the algorithmic complexity of ACS and argue that it is more powerful in modeling both local meaning and sentence-level structure, while offering the same practicality as the established algorithms it generalizes.

References

  1. Y. Akiba, K. Iamamurfa, and E. Sumita. 2001. Using multiple edit distances to automatically rank machine translation output. MT Summit VIII. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. Culy and S. Z. Riehemann. 2003. The limits of n-gram translation evaluation metrics. Machine Translation Summit IX.Google ScholarGoogle Scholar
  3. George Doddington. 2002. Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. Human Language Technology Conference (HLT). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. V. I. Levenshtein. 1965. Binary codes capable of correcting deletions, insertions, and reversals. Doklady Akademii Nauk SSSR.Google ScholarGoogle Scholar
  5. C. Y. Lin and F. J. Och. 2004. Automatic evaluation of machine translation quality using longest common subsequence and skip bigram statistics. ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Niessen, F. J. Och, G. Leusch, and H. Ney. 2000. An evaluation tool for machine translation: Fast evaluation for mt research. LREC.Google ScholarGoogle Scholar
  7. K. Papineni, S. Roukos, T. Ward, and W. J. Zhu. 2001. Bleu: a method for automatic evaluation of machine translation. IBM Research Report. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Soricut and E. Brill. 2004. A unified framework for automatic evaluation using n-gram co-occurence statistics. ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. K. Y. Su, M. W. Wu, and J. S. Chang. 1992. A new quantitative quality measure for machine translation systems. COLING. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. P. Turian, L. Shen, and I. D. Melamed. 2003. Evaluation of machine translation and its evaluation. MT Summit IX.Google ScholarGoogle Scholar
  11. C. J. Van-Rijsbergen. 1979. Information retrieval. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. BLANC: learning evaluation metrics for MT

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image DL Hosted proceedings
      HLT '05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
      October 2005
      1054 pages

      Publisher

      Association for Computational Linguistics

      United States

      Publication History

      • Published: 6 October 2005

      Qualifiers

      • Article

      Acceptance Rates

      HLT '05 Paper Acceptance Rate127of402submissions,32%Overall Acceptance Rate240of768submissions,31%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader