Article

Free Access

BLANC: learning evaluation metrics for MT

Authors:
Lucian Vlad Lita

Carnegie Mellon University

Carnegie Mellon University
View Profile

,
Monica Rogati

Carnegie Mellon University

Carnegie Mellon University
View Profile

,
Alon Lavie

Carnegie Mellon University

Carnegie Mellon University
View Profile

HLT '05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language ProcessingOctober 2005Pages 740–747https://doi.org/10.3115/1220575.1220668

Published:06 October 2005Publication History

HLT '05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing

Pages 740–747

ABSTRACT

We introduce BLANC, a family of dynamic, trainable evaluation metrics for machine translation. Flexible, parametrized models can be learned from past data and automatically optimized to correlate well with human judgments for different criteria (e.g. adequacy, fluency) using different correlation measures. Towards this end, we discuss ACS (all common skip-ngrams), a practical algorithm with trainable parameters that estimates reference-candidate translation overlap by computing a weighted sum of all common skip-ngrams in polynomial time. We show that the BLEU and ROUGE metric families are special cases of BLANC, and we compare correlations with human judgments across these three metric families. We analyze the algorithmic complexity of ACS and argue that it is more powerful in modeling both local meaning and sentence-level structure, while offering the same practicality as the established algorithms it generalizes.

References

Y. Akiba, K. Iamamurfa, and E. Sumita. 2001. Using multiple edit distances to automatically rank machine translation output. MT Summit VIII. Google ScholarDigital Library
C. Culy and S. Z. Riehemann. 2003. The limits of n-gram translation evaluation metrics. Machine Translation Summit IX.Google Scholar
George Doddington. 2002. Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. Human Language Technology Conference (HLT). Google ScholarDigital Library
V. I. Levenshtein. 1965. Binary codes capable of correcting deletions, insertions, and reversals. Doklady Akademii Nauk SSSR.Google Scholar
C. Y. Lin and F. J. Och. 2004. Automatic evaluation of machine translation quality using longest common subsequence and skip bigram statistics. ACL. Google ScholarDigital Library
S. Niessen, F. J. Och, G. Leusch, and H. Ney. 2000. An evaluation tool for machine translation: Fast evaluation for mt research. LREC.Google Scholar
K. Papineni, S. Roukos, T. Ward, and W. J. Zhu. 2001. Bleu: a method for automatic evaluation of machine translation. IBM Research Report. Google ScholarDigital Library
R. Soricut and E. Brill. 2004. A unified framework for automatic evaluation using n-gram co-occurence statistics. ACL. Google ScholarDigital Library
K. Y. Su, M. W. Wu, and J. S. Chang. 1992. A new quantitative quality measure for machine translation systems. COLING. Google ScholarDigital Library
J. P. Turian, L. Shen, and I. D. Melamed. 2003. Evaluation of machine translation and its evaluation. MT Summit IX.Google Scholar
C. J. Van-Rijsbergen. 1979. Information retrieval. Google ScholarDigital Library

BLANC: learning evaluation metrics for MT
1. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

Are Slice-Based Cohesion Metrics Actually Useful in Effort-Aware Post-Release Fault-Proneness Prediction? An Empirical Study
Background. Slice-based cohesion metrics leverage program slices with respect to the output variables of a module to quantify the strength of functional relatedness of the elements within the module. Although slice-based cohesion metrics have been ...
Read More
Object-oriented runtime software quality analysis
Read More
Lack of Conceptual Cohesion of Methods: A new alternative to Lack Of Cohesion of Methods
ISEC '15: Proceedings of the 8th India Software Engineering Conference

While often defined in informal ways, class cohesion reflects important properties of modules in a software system. High cohesion for classes is one of the desirable properties in Object Oriented (OO) analysis as it supports program comprehension, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
HLT '05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
October 2005
1054 pages
Conference Chair:
Raymond J. Mooney
The University of Texas at Austin
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 6 October 2005
Qualifiers
- Article
Conference

Acceptance Rates
HLT '05 Paper Acceptance Rate127of402submissions,32%Overall Acceptance Rate240of768submissions,31%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 224
  Total Downloads
- Downloads (Last 12 months)17
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

BLANC: learning evaluation metrics for MT

HLT '05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing

ABSTRACT

References

Cited By

Recommendations

Are Slice-Based Cohesion Metrics Actually Useful in Effort-Aware Post-Release Fault-Proneness Prediction? An Empirical Study

Object-oriented runtime software quality analysis

Lack of Conceptual Cohesion of Methods: A new alternative to Lack Of Cohesion of Methods

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

BLANC: learning evaluation metrics for MT

HLT '05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing

ABSTRACT

References

Cited By

Recommendations

Are Slice-Based Cohesion Metrics Actually Useful in Effort-Aware Post-Release Fault-Proneness Prediction? An Empirical Study

Object-oriented runtime software quality analysis

Lack of Conceptual Cohesion of Methods: A new alternative to Lack Of Cohesion of Methods

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media